facebook rss twitter

Review: Sapphire Radeon 9700 Pro

by Ryszard Sommefeldt on 26 October 2002, 00:00

Tags: Sapphire

Quick Link: HEXUS.net/qanw

Add to My Vault: x


In a modern GPU (graphics processing unit), things are done in parallel. That's to say that when pumping out textured and lit pixels to pop into a framebuffer then finally out to your screen, a modern GPU will work on more than one pixel at a time (per clock).

Much like a CPU like the one powering the machine you are using to read this, GPU's feature a number of pipelines, parts of the silicon that do all the operations necessary to produce the final output on your screen.

As a rule of thumb, bigger is better. The more pipelines and the more pixels that can be operated on in those pipelines, per tick of the GPU's internal clock, the faster a card will be. So the numbers can be deceiving depending on how a GPU is laid out. Take the two protagonists in this review, NV25 and R300 (GeForce4 Ti and Radeon 9700 Pro) for example.

Radeon 9700 Pro

• 8 pixel pipelines
• 1 texturing unit per pipeline

GeForce4 Ti

• 4 pixel pipelines
• 2 texturing units per pipeline

So at the same clocks, each GPU has roughly the same pixel processing power with a total of 8 textured pixels per clock possible. At 300Mhz (Ti4600 speeds), GeForce4 Ti can do 2400 million texels (textured pixels) per second in raw fillrate. At 325MHz, Radeon 9700 Pro can do 2600 million texels per second. So not much in it, less than 10% more texel fillrate in absolute terms.

In terms of raw pixel fillrate (without being passed through a texture unit), GeForce4 Ti at 300Mhz can do 1200 million pixels per second and Radeon 9700 Pro can do 2600 million pixels per second. Due to having 8 pixel pipelines over the GeForce4 Ti's 4 pipelines, raw pixel fillrate is over 200% higher on Radeon 9700 Pro.

So raw texel fillrate (the more meaningful figure) is roughly equivalent to GeForce4 Ti. Is that all that ATI can do over a product from NVIDIA that's been on the market much longer?

Of course not, there's more to just raw GPU horsepower in the performance of a modern 3D accelerator. Memory bandwidth also plays a very large part in proceedings, especially in situations where the accelerator is made to work hard, situations where the display resolution is high (meaning more texels need to be created for a given scene). Also if a number of performance sapping features are enabled such as anisotropic texture filtering (which can mean more passes through the texturing units which the Radeon 9700 Pro can do 16 times and GeForce4 Ti 2 times) or full scene antialiasing, the traditional fillrate killer where more than one copy of a scene is rendered internally before output to your screen.

So in terms of memory bandwidth, what do our two fist fighters in todays review have to lay on the table?

Radeon 9700 Pro

• 256-bit memory bus width
• 620MHz (310MHz DDR) memory clock frequency

GeForce4 Ti(4600)

• 128-bit memory bus width
• 650MHz (325MHz DDR) memory clock frequency

To work out card bandwidth based on those figures, employ the following formula:

(bus width / 8) * memory clock frequency
That gives you a number in MBytes/sec (the division by 8 converts from bits to bytes) and if you then divide by 1024 (the number of MBytes in a GByte) you get a workable figure in GB/sec. For Radeon 9700 Pro this works out at 19.4GB/sec of raw memory bandwidth, close to double the 10.2GB/sec of GeForce4 Ti in Ti4600 configuration with 650MHz memory frequency.

Now that's a peak memory bandwidth figure, the maximum the hardware could possibly achieve under perfect rendering conditions. Common sense dictates that it will never be the case that peak memory bandwidth will be utilised. Overheads doing certain types of texel operation, certain reads and writes from card memory (not to mention to and from AGP memory) and the more usual issues in terms of memory bandwith utilisation with regards to pixel/texel overdraw all seek to leech bandwidth from that perfect figure and give you a lot less 'effective' bandwidth to play with.

Taking that information, an accelerator that can theoretically run high peak bandwidth figures can easily have less effective bandwidth than an accelerator providing a lower bandwidth peak.

It's all about how you make use of your peak bandwidth and get the most out of it. Let's talk about that a little.