NVIDIA GeForce 8800 GTX

NVIDIA G80 - Clocks and flops and samples and ROPs and things

First of all, confirmation of how big the thing is, who builds it and how fast the powered-by-G80 GeForce 8800 GTX is. The venerable TSMC currently build G80 for NVIDIA, and they're on A2 silicon for the purposes of the first reference and retail boards, but A3 silicon looms large we reckon, tweaking for yields and minor hotspots most likely. It's a 90nm part (90HS as far as TSMC's process grading goes), roughly 20x22mm in size, and contains 681M transistors (according to NVIDIA, we haven't had time to count them all yet to make sure).

That's by far and away the largest graphics chip ever created by man or beast (well, people do talk about Stuart Oberman in an 'odd' way sometimes), making G71 look tiny (196mm², 278M transistors, same process) and knocking ATI's R580 off the top spot as The Big One™ (352mm², 384M transistors, same process again).

The next big thing is the chip's frequency. NVIDIA go split again, separating the base chip clock from the clock that drives the shader core. Folks like me were expecting a 2x 'double-pumped' rate if this happened (it makes it arguably easier to design the chip that way), but NVIDIA say they are happy to pass data around the chip over non-2x clock boundaries without any real issue. As it stands, with GeForce 8800 GTX, you get a base clock of 575MHz and a shader clock of 1350MHz. Yep, those fully FP32 128 SPs and associated 128 interpolator/SF ALUs are all running at over 1GHz in an 8800 GTX, giving rise to big instruction rates and potential shading performance.

The difference in architectures (fully scalar versus mostly vector) means it's hard to compare G80 to previous NVIDIA hardware and the current ATI approach in terms of raw mano-a-mano numbers, but we'll try anyway. Note that since it's unified G80 gets all shading units available for all shading ops, no matter the thread type, so the table shows max available units per cycle only. For those not paying attention, there are not 384 SPs!

Lastly before we move on, the 384-bit memory bus is a first on a consumer graphics part and not to be ignored as we talk about the chip's performance.

Clocks and flops and samples and ROPs and things

Spec / Chip	NVIDIA G80	NVIDIA G71	ATI R580
Variant	GeForce 8800 GTX	GeForce 7900 GTX	Radeon X1950 XTX
Process	TSMC, 90nm (90HS)
Transistor Count	681M	278M	384M
Die Size	20x22mm	13.5x14.5mm	18.5x19.5mm
Clocks	575MHz base, 1350MHz shader, 900MHz memory	650MHz base/shader, 700MHz VS, 800MHz memory	650MHz base, 1000MHz memory
DirectX Shader Model	4.0	3.0	3.0
Vertex Shading	128 FP32 scalar ALUs, MADD+MUL dual-issue	8 vec4 + scalar ALUs, MADD co-issue	8 vec4 + scalar ALUs, MADD co-issue
Fragment Shading	128 FP32 scalar ALUs, MADD+MUL dual-issue	24 vec3 + scalar ALUs, MADD+MADD dual-issue	48 vec3 + scalar ALUs, MADD+ADD dual-issue
Geometry Shading	128 FP32 scalar ALUs, MADD+MUL dual-issue
Data Sampling and Filtering	32ppc address and 64ppc bilinear INT8 filtering, max 16xAF	24ppc address and 24ppc bilinear INT8 filtering, max 16xAF	16ppc address and 16ppc bilinear INT8 filtering, max 16xAF
ROPs	24, 8Z or 8C samples/clk, 2clk FP16 blend 8xMSAA, 16xCSAA	16, 2Z or 2C samples/clk, 2clk FP16 blend 4xMSAA	16, 2Z or 1C samples/clk, 2clk FP16 blend 6xMSAA
Memory Interface	384-bit, 6 memory channels, GDDR->GDDR4	256-bit, 4 memory channels, GDDR->GDDR3	256-bit, 8 memory channels, GDDR->GDDR4
Memory Bandwidth	86.40GB/sec	51.20GB/sec	64.00GB/sec

Theoretical Rates for GeForce 8800 GTX and GeForce 7900 GTX
	NVIDIA GeForce 8800 GTX	NVIDIA GeForce 7900 GTX
Core Clock	575MHz (1350MHz shader)	650MHz (700MHz VS)
Pixel fillrate	13.8G pixels/sec	10.4G pixels/sec
Texture sampling rate	36.8G bilerps/sec	15.6G bilerps/sec
Z-only fillrate	110.4G samples/sec	20.8G samples/sec
Vertex transform rate	10.80G tris/sec	1.40G tris/sec
VP MADD issue rate	172.8G instr/sec	5.60G instr/sec
FP MADD issue rate	172.8G instr/sec	31.2G instr/sec

Now the instruction issue rates aren't quite fair (scalar vs. vector, only MADD, etc), but we display them like that on purpose to highlight once more the fact that G80 is entirely scalar in its ALU makeup and that it has a 1350MHz shader clock in 8800 GTX form. Peak rates mean little without some measure of the efficiency of the shader core, and that's what making the chip scalar is meant to maximise in G80. Simple divides will get you to the peak vec4 MADD rates for vertex and fragment shading if you're horribly concerned.

As far as thinking about possible game performance ahead of actual measurement went, as we were working through the theoretical analysis of what the hardware could do, a surfeit of bilinear filtering ability and potential for near-peak efficiency in the shader core meant that we thought performance should flow freely, and with the ROP hardware at the end of the chip looking very sweet and peak theoretical memory bandwidth being as high as it is, GeForce 8800 GTX has the on-paper potential to fly.

NVIDIA mentioned at Editor's Day for G80 that they'd taken a look at current game shaders and those for implementation of likely popular rendering algorithms in the future, both SM3.0 and SM4.0, as the reason behind going entirely scalar and regressing back to MADD+MUL as the primary instruction basis for the SPs. Remember that NV40 (and NV30 if we remember correctly) was also MADD+MUL, NVIDIA adding an ADD (see what we did there?) back to the fragment ALUs in G7x (and NV35!) to go dual-MADD again. Reverting back out sees the company flip-flop the instruction basis for the 5th major architecture change in a row. Kind of cool to note (and it's the reason why NVIDIA quoted MUL rates at Ed's Day, rather than MADDs!).

So did our pre-game test thinking translate into what was largely expected, IQ and performance wise? Well, we'll tell you, but not before looking at a board!

Review: NVIDIA GeForce 8800 GTX

NVIDIA G80 - Clocks and flops and samples and ROPs and things

Clocks and flops and samples and ROPs and things

MY HEXUS

EVENTS

INDUSTRY PRESS RELEASES