facebook rss twitter

Review: ATI Radeon X1900 XT and XTX

by Ryszard Sommefeldt on 24 January 2006, 00:54

Tags: ATi Technologies (NYSE:AMD)

Quick Link: HEXUS.net/qael5

Add to My Vault: x

R580: Fragment processing and texturing

Fragment processing is where R580 differs most from R520. While it maintains the same fragment ALU setup as R520, including the four quad processors, there are three times as many available individual ALUs available for each quad of units. Here's one of those 48 units.

fragment unit

Each fragment ALU is made up of two sub units, themselves made up of paired vector and scalar units, along with a flow control/branch unit. The first sub unit feeds into the second, and the first unit can issue a '4D' ADD across the fragment. The second unit can issue a '4D' MADD (and therefore ADD and MUL, too), as well as texture.

All fragment processing and texture address is done in full FP32 precision, with no partial precision mode at any point. The texture units are separate from the fragment units, allowing the hardware to schedule texture ops outside of basic fragment processing, but the R5-series still texture across their quads. That means a block of four texture units are assigned to each quad of fragment processors, each three wide now in R580. Not a total disconnect, but a scheduling disconnect.

We still wait for the modern GPU that completely separates texture sampling from shader units, allowing the programmer and hardware to combine, assigning sampler work arbitrarily per-fragment.

So in a three-way block of ALUs, per 'full' fragment processor (fp), the texture unit assigned is shared. Doing the mental arithmetic means that, crucially, R580 has three times the fragment processors as R520. Tripling the processing power of the fragment hardware isn't an insignificant undertaking, but it bestows the R580, in XT or XTX configuration, with more general fp processing than any other graphics processor yet created, including that in ATI's own Xenos GPU in the Xbox 360.

Fragments output from the ALU groups are fed into a first-in, first-out buffer, before being spat out to the raster output hardware. Let's talk about the memory controller and ROP hardware, concluding how R580 is put together in a general sense.