The requirements: elegant design and lots of powerThe requirements
Ask 100 enthusiasts what they'd like to see in a cutting-edge graphics card and criterion such as forward-looking design, performance, energy efficiency, and, of course, price will crop up. GF100, appreciating its time of arrival, needs to be almost all things to all people. Let's take each of these in turn and evaluate how good NVIDIA's new GPUs are.
Elegant design and performance?Modern GPUs are parallel processing engines. Simplifying it hugely, GPU-fed instructions can be worked on by many mini-processors in tandem. Just add more of these cores and, voila, there's more power on tap. This lovely view of ever-more-powerful GPU design is constrained by a) manufacturing cost and yield issues b) heat c) process limitations and d) complexity, to name but a few.
In their highest-performing GPUs, NVIDIA and AMD look for the best compromise between silicon-chewing transistors, associated heat, and economic feasibility. We'd all love a 10,000-core GPU with a 75mm² die-space and an etail price of $49, but it's not likely to happen anytime soon.
Cutting to the chase, here is the standard GF100 'Fermi' die picture.
The numbers are large by any present standards. We can immediately see that NVIDIA has gone for a huge-die approach for design rather than mid-sized silicon with greater flexibility. But the devil is in the details.
The chip is effectively split into what NVIDIA terms Graphics Processing Clusters. Four GPCs make up one full-fat Fermi, and each GPC has standalone logic to be considered a mini-GPU in itself.
This picture shows the GPCs more clearly. Each GPC contains four Streaming Multiprocessors (SM), coloured green, that are composed of 32 'regular' cores each, leading to a per-GPC count of 128 and a GPU-wide tally of 512. It's difficult to draw accurate comparisons with AMD/ATI's setups, but the count compares favourably with the incumbent single-GPU GeForce GTX 285’s 240 cores.
As a strategic shift in thinking, the GPCs also contain their own setup engine, on-chip cache, and four texture units. One reason why NVIDIA has chosen to design mini-GPUs rather than a fixed top-to-bottom setup lies with the innate complexity in keeping up-to 512 cores properly fed with data, we imagine.
The 64 (16x4) texture units continue to offer the same filtering as on the GT 200 line. Appreciating that GeForce GTX 285 has 80 texture units, Fermi shouldn't fare as well, but the company has boosted the texture-unit clocks to 700MHz.
GF100's speed is set by the shader clock. The general clock, which influences almost all performance-related parameters - texturing, caches, PolyMorph Engine, etc - runs at half the shader-clock's speed.
Having mini-GPUs helps breaks down GPU complexity into manageable chunks and should pave the way for future NVIDIA designs that increase the processing cores to 1,000-plus.
DX11, the tessellator, and the question of geometry
Fermi is compliant with Microsoft's DX11 API. The rigid specification means that the GPU has to support hardware tessellation - the ability to generate complex, many-triangle models from low-detail inputs - multithreading, DirectCompute 11, and HDR texture compression.
Thinking of the processing of geometry and associated amplification benefits of tessellation in particular, Fermi’s 16 PolyMorph Engines, spread over the SMs, can now parallelise the initial setup and workload, giving more geometry oomph to the GPU, NVIDIA says. We'll examine the proof in the performance section. GPUs' geometry processing has taken a backseat when compared to the focus on shading power, and it seems as if NVIDIA is looking to realign that with Fermi.