facebook rss twitter

Review: NVIDIA GeForce 8800 GTX

by Ryszard Sommefeldt on 8 November 2006, 19:08

Tags: XFX GeForce 8800 GTX, NVIDIA (NASDAQ:NVDA)

Quick Link: HEXUS.net/qahaj

Add to My Vault: x

NVIDIA G80 - Clocks and flops and samples and ROPs and things

First of all, confirmation of how big the thing is, who builds it and how fast the powered-by-G80 GeForce 8800 GTX is. The venerable TSMC currently build G80 for NVIDIA, and they're on A2 silicon for the purposes of the first reference and retail boards, but A3 silicon looms large we reckon, tweaking for yields and minor hotspots most likely. It's a 90nm part (90HS as far as TSMC's process grading goes), roughly 20x22mm in size, and contains 681M transistors (according to NVIDIA, we haven't had time to count them all yet to make sure).

That's by far and away the largest graphics chip ever created by man or beast (well, people do talk about Stuart Oberman in an 'odd' way sometimes), making G71 look tiny (196mm², 278M transistors, same process) and knocking ATI's R580 off the top spot as The Big One™ (352mm², 384M transistors, same process again).

The next big thing is the chip's frequency. NVIDIA go split again, separating the base chip clock from the clock that drives the shader core. Folks like me were expecting a 2x 'double-pumped' rate if this happened (it makes it arguably easier to design the chip that way), but NVIDIA say they are happy to pass data around the chip over non-2x clock boundaries without any real issue. As it stands, with GeForce 8800 GTX, you get a base clock of 575MHz and a shader clock of 1350MHz. Yep, those fully FP32 128 SPs and associated 128 interpolator/SF ALUs are all running at over 1GHz in an 8800 GTX, giving rise to big instruction rates and potential shading performance.

The difference in architectures (fully scalar versus mostly vector) means it's hard to compare G80 to previous NVIDIA hardware and the current ATI approach in terms of raw mano-a-mano numbers, but we'll try anyway. Note that since it's unified G80 gets all shading units available for all shading ops, no matter the thread type, so the table shows max available units per cycle only. For those not paying attention, there are not 384 SPs!

Lastly before we move on, the 384-bit memory bus is a first on a consumer graphics part and not to be ignored as we talk about the chip's performance.

Clocks and flops and samples and ROPs and things

Spec / Chip NVIDIA G80 NVIDIA G71 ATI R580
Variant GeForce 8800 GTX GeForce 7900 GTX Radeon X1950 XTX
Process TSMC, 90nm (90HS)
Transistor Count 681M 278M 384M
Die Size 20x22mm 13.5x14.5mm 18.5x19.5mm
Clocks 575MHz base, 1350MHz shader, 900MHz memory 650MHz base/shader, 700MHz VS, 800MHz memory 650MHz base, 1000MHz memory
DirectX Shader Model 4.0 3.0 3.0
Vertex Shading 128 FP32 scalar ALUs, MADD+MUL dual-issue 8 vec4 + scalar ALUs, MADD co-issue 8 vec4 + scalar ALUs, MADD co-issue
Fragment Shading 128 FP32 scalar ALUs, MADD+MUL dual-issue 24 vec3 + scalar ALUs, MADD+MADD dual-issue 48 vec3 + scalar ALUs, MADD+ADD dual-issue
Geometry Shading 128 FP32 scalar ALUs, MADD+MUL dual-issue
Data Sampling and Filtering 32ppc address and 64ppc bilinear INT8 filtering, max 16xAF 24ppc address and 24ppc bilinear INT8 filtering, max 16xAF 16ppc address and 16ppc bilinear INT8 filtering, max 16xAF
ROPs 24, 8Z or 8C samples/clk, 2clk FP16 blend
8xMSAA, 16xCSAA
16, 2Z or 2C samples/clk, 2clk FP16 blend
4xMSAA
16, 2Z or 1C samples/clk, 2clk FP16 blend
6xMSAA
Memory Interface 384-bit, 6 memory channels, GDDR->GDDR4 256-bit, 4 memory channels, GDDR->GDDR3 256-bit, 8 memory channels, GDDR->GDDR4
Memory Bandwidth 86.40GB/sec 51.20GB/sec 64.00GB/sec

Theoretical Rates for GeForce 8800 GTX and GeForce 7900 GTX
NVIDIA GeForce 8800 GTX NVIDIA GeForce 7900 GTX
Core Clock 575MHz (1350MHz shader) 650MHz (700MHz VS)
Pixel fillrate 13.8G pixels/sec 10.4G pixels/sec
Texture sampling rate 36.8G bilerps/sec 15.6G bilerps/sec
Z-only fillrate 110.4G samples/sec 20.8G samples/sec
Vertex transform rate 10.80G tris/sec 1.40G tris/sec
VP MADD issue rate 172.8G instr/sec 5.60G instr/sec
FP MADD issue rate 172.8G instr/sec 31.2G instr/sec

Now the instruction issue rates aren't quite fair (scalar vs. vector, only MADD, etc), but we display them like that on purpose to highlight once more the fact that G80 is entirely scalar in its ALU makeup and that it has a 1350MHz shader clock in 8800 GTX form. Peak rates mean little without some measure of the efficiency of the shader core, and that's what making the chip scalar is meant to maximise in G80. Simple divides will get you to the peak vec4 MADD rates for vertex and fragment shading if you're horribly concerned.

As far as thinking about possible game performance ahead of actual measurement went, as we were working through the theoretical analysis of what the hardware could do, a surfeit of bilinear filtering ability and potential for near-peak efficiency in the shader core meant that we thought performance should flow freely, and with the ROP hardware at the end of the chip looking very sweet and peak theoretical memory bandwidth being as high as it is, GeForce 8800 GTX has the on-paper potential to fly.

NVIDIA mentioned at Editor's Day for G80 that they'd taken a look at current game shaders and those for implementation of likely popular rendering algorithms in the future, both SM3.0 and SM4.0, as the reason behind going entirely scalar and regressing back to MADD+MUL as the primary instruction basis for the SPs. Remember that NV40 (and NV30 if we remember correctly) was also MADD+MUL, NVIDIA adding an ADD (see what we did there?) back to the fragment ALUs in G7x (and NV35!) to go dual-MADD again. Reverting back out sees the company flip-flop the instruction basis for the 5th major architecture change in a row. Kind of cool to note (and it's the reason why NVIDIA quoted MUL rates at Ed's Day, rather than MADDs!).

So did our pre-game test thinking translate into what was largely expected, IQ and performance wise? Well, we'll tell you, but not before looking at a board!