Mantle reduces the overheads imposed by these APIs. For example, the APIs perform various kinds of validation of each batch as it's sent. This can be superfluous—there's no need to revalidate an object that's drawn every single frame, for example—and so Mantle performs the validation once, when the object is created, rather than once per frame. It also reduces the strictness of some of that validation, on the basis that developers will do the work at development time, rather than having to do it on end-user machines at run time.
Mantle also supports filling command buffers in multiple threads.
Along with some other improvements, such as reducing the overhead of compiling shader programs, the result is that Mantle cuts the CPU time needed to get graphics onto the screen and gives the GPU more batches to process. When the CPU is the bottleneck, the gains can, if AMD and EA's figures are anything to go by, be substantial.
We asked AMD if the techniques could be used to provide gains to existing OpenGL and Direct3D programs. For example, Direct3D 11 permits command buffer generation to be done in parallel, with a feature called deferred contexts and multithreaded rendering. However, some video driver developers—including AMD—have not implemented multithreaded rendering support, so while the API supports parallelism, the work is done serially anyway, and sometimes more slowly than if no multithreading was used. As a result of this poor driver support, some game developers have removed multithreaded rendering support from their game engines.
In response, AMD told us that developers have tried to do it but haven't had much success with using such techniques with existing APIs, and that it requires Mantle to do the job properly. It's not immediately clear to us if this is because of AMD's refusal to implement the support in its drivers, or if there really is a problem with using Direct3D in this way; there is clearly something of the chicken and the egg at work here.