facebook rss twitter

Review: NVIDIA GeForce 7800 GTX 512

by Ryszard Sommefeldt on 14 November 2005, 14:01

Tags: NVIDIA (NASDAQ:NVDA)

Quick Link: HEXUS.net/qad2a

Add to My Vault: x

GTX 512; GTX on Steroids

The G70 chip that powers existing GeForce 7800 GTX boards is what powers the GTX 512, just with some tweaks here and there. Before we go over those, a refresher on how G70 works.

G70 starts with a set of 5D vector processors that can operate on a full FP32 (32-bit floating point precision, s23e8 for the nerds among you) data-type (actually a 4 dimensional vector and an FP32 scalar) per clock, using a set of Shader Model 3.0 (SM3.0) instructions (along with support for all preceding Shader Models and the fixed function Direct3D and OpenGL TnL pipelines). Each processor also has a texture address processor and texture sampler, FP32 in precision again, that can point sample from texture data. Grabbing data from texture sources for geometry generation is key for a number of render effects, and a key part of SM3.0

Those vector processors operate on vertex data, as 'vertex shaders', generating geometry that defines what's on the screen. Eight of those, clocked at 470MHz in a reference GTX, generate geometry to be rasterised and passed on to the 'pixel shaders' to be processed, as pixel fragments.

The fragment hardware, an array of dual FP32 4D vector units, each with a FP32 texture sampler (bilinear sampling - 1 sample per clock usually - this time), pushes the pixels using SM3.0 instructions (and support for all previous, as in the vertex hardware).

There's 24 of those pixel processors - each made up of two 4D vector ALUs remember - in a full G70. Clocked at 430MHz in the existing GTX, that's a hugely formidable shader rate of 20.6G instructions/sec (all 4D MADDs, if you wish), which compared to X1800 XT's 20G instructions/sec (only half of which can be MADDs) is quite the set of pixel units.

Those pixel units output processed fragments into 16 ROPs. ROPs draw the pixels on the screen, along with being responsible for other ops such as colour compression (saves memory bandwidth) and Z test (for multisample antialiasing). The ROPs run at their own clock, 430MHz on a reference GTX. Each can do one Z and colour write per clock, or two Z writes with colour disabled. That's where NVIDIA's hardware, with a double Z rate, is so good for a Z-only prepass in certain games....

Memory is accessed at 600MHz, twice per clock (good old DDR), over a 256-bit bus. 38.4GiB/sec is pretty sweet and keeps everything fed and watered through all the stages of processing, vertices onwards. GTXs only come with 256MiB of card memory, too. Or they did until today.

With that overtly technical overview in mind - and don't worry if it means the thick end of nothing to you, games benchmarks are on their way and we have a HEXUS.glossary in the making to teach you the 3D basics - let's have a look at how the GTX 512 improves upon GTX.

Give me clocks. Lots of clocks.

Neo is the man, and that bastardised line from the Matrix is awesome. Moving on, the GTX 512 not only features twice the on-card memory size of the original GTX (that's where the 512 comes from) - making good use of some brand new Samsung GDDR3 DRAMs along the way - it's also clocked a bit faster.

Whereas you could say the original G70 in the GTX has 470/430/430 clocks (vertex/pixel/ROP), the GTX 512 has 550/550/550 reference clocks for the GPU. So vertex rate goes up by 17%, and pixel and ROP clocks are up by nearly 28%. Remember the original GTX's 20.6Ginst/sec pixel shader rate (all 5D MADs, theoretically, given the right situation)? 26.4Ginst/sec now. Pixel output rate and texture rates go up by the same ratio, making the GTX 512 gobs quicker.

What about memory bandwidth you say? 850MHz I reply. DDR, too. So 1.7GHz and 54.4GB/sec when you do the sums. Forgive me the slightly giggly schoolgirl way to reference these figures, but they're a step above what you'll find on an already powerful NVIDIA desktop accelerator.

But how?

There's been no process shrink or other drastic silicon gubbins, in order to get the new clock rate out of the G70 GPU. Rather some path tweaks, some layer reworking, a small voltage bump and a just-to-be-on-the-safe-side new cooler were exployed. I further interrogated NVIDIA for extra detail on how the new speed was found, and the simple answer is that it was there all along; it just needed a bit of a helping hand to release in a production ready form. "Anyway", they say, "vendor SKUs will be likely clocked even higher as usual, so the 550MHz wasn't a huge deal to find". Right you are.

Summary

It's 'just' the older GTX with twice the memory, seriously elevated clocks, and a new cooler. Everything that defines a GTX, from the feature set (FP blending and filtering, vertex texturing, SLI ability, Shader Model 3.0 support) upwards is here, just there's more of it in all ways. NVIDIA's backup plan was a simple one, then.

Let's see how effective it is, starting with a look at the board itself in the flesh. Nude, too, if you're lucky.