Bringing them all together
Trotting out the comparison tableGraphics cards | ATI Radeon HD 4870 512MiB | ATI Radeon HD 4850 512MiB | ATI Radeon HD 3850 | ATI Radeon HD 3870 512 | NVIDIA GeForce 9800 GTX+ 512 | NVIDIA GeForce 9800 GTX 512 | NVIDIA GeForce 8800 GTS 512 | NVIDIA GeForce 8800 GT | NVIDIA GeForce 9600 GT |
---|---|---|---|---|---|---|---|---|---|
PCIe | PCIe 2.0 | ||||||||
GPU clock | 750MHz | 625MHz | 666MHz | 775MHz | 738MHz | 675MHz | 650MHz | 600MHz | 650MHz |
Shader clock | 750MHz | 625MHz | 666MHz | 775MHz | 1,836MHz | 1,688MHz | 1,625MHz | 1,500MHz | 1,625MHz |
Memory clock (effective) | 3,600MHz | 2,000MHz | 1,656MHz | 2,250MHz | 2,200MHz | 2,200MHz | 1,940MHz | 1,800MHz | 1,800MHz |
Memory interface, and size, | 256-bit, 512MiB, GDDR5 | 256-bit, 512MiB, GDDR3 | 256-bit, 512MiB, GDDR4 | 256-bit, 512MiB, GDDR3 | |||||
Memory bandwidth | 115GiB/sec | 64GiB/sec | 53GiB/sec | 72.8GiB/sec | 70.4GiB/sec | 70.4GiB/sec | 62.1GiB/sec | 57.6GiB/sec | 57.6GiB/sec |
Manufacturing process | TSMC, 55nm | TSMC, 65nm | |||||||
Transistor count | 965M | 965M | 666M | 666M | 754M | 754M | 754M | 754M | 505M |
Die size | 260mm² | 260mm² | 192mm² | 192mm² | 230mm² | 330mm² | 330mm² | 296mm² | 240mm² |
Double-precision support | Yes | Yes | Yes | Yes | No | No | No | No | No |
DirectX Shader Model | DX10.1, 4.1 | DX10, 4.0 | |||||||
Vertex, fragment, geometry shading (shared) | 800 FP32 scalar ALUs, MADD dual-issue (unified) | 800 FP32 scalar ALUs, MADD dual-issue (unified) | 320 FP32 scalar ALUs, MADD dual-issue (unified) | 320 FP32 scalar ALUs, MADD dual-issue (unified) | 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) | 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) | 128 FP32 scalar ALUs, MADD dual-issue + MUL (unified) | 112 FP32 scalar ALUs, MADD dual-issue + MUL (unified) | 64 FP32 scalar ALUs, MADD dual-issue + MUL (unified) |
Peak GFLOPS | 1,200 | 1,000 | 426.2 | 496 | 470/705* | 432/648* | 416/624* | 336/504* | 208/312* |
Data sampling and filtering | 40ppc address and 40ppc bilinear INT8/20ppc FP16 filtering, max 16xAF | 40ppc address and 40ppc bilinear INT8/ 20ppc FP16 filtering, max 16xAF | 16ppc address and 16ppc bilinear INT8/FP16 filtering, max 16xAF | 16ppc address and 16ppc bilinear INT8/FP16 filtering, max 16xAF | 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF | 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF | 64ppc address and 64ppc bilinear INT8/32ppc FP16 filtering, max 16xAF | 56ppc address and 56ppc bilinear INT8/28ppc FP16 filtering, max 16xAF | 32ppc address and 32ppc bilinear INT8/16ppc FP16 filtering, max 16xAF |
Peak fillrate Gpixels/s | 12 | 10 | 10.656 | 12.4 | 11.8 | 10.8 | 10.4 | 9.6 | 10.4 |
Peak Gtexel/s (bilinear) | 30 | 25 | 10.656 | 12.4 | 47.2 | 43.2 | 41.6 | 33.6 | 20.8 |
Peak Gtexel/s (FP16, bilinear) | 15 | 12.5 | 10.656 | 12.4 | 23.6 | 21.6 | 20.8 | 16.8 | 10.4 |
ROPs | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 |
Peak TDP (claimed) | 160 | 110 | 90 | 105 | Unknown | 155 | 140 | 105 | 95 |
Power connectors (default clocked) | Two 6-pin | 6-pin | 6-pin | 6-pin | Two 6-pin | Two 6-pin | 6-pin | 6-pin | 6-pin |
Multi-GPU | CrossFire - four-board | CrossFire - four-board | CrossFire - four-board | CrossFire - four-board | SLI - three-board | SLI - three-board | SLI - two-board | SLI - two-board | SLI - two-board |
Outputs | 2 x dual-link DVI w/HDCP, HDMI 7.1 (native, on GPU) | 2 x dual-link DVI w/HDCP, HDMI 5.1 (native, on GPU) | 2 x dual-link DVI w/HDCP, native HDMI 5.1 (via S/PDIF) | ||||||
Hardware-assisted video-decoding engine | AMD UVD - full H.264 and VC-1 decode | NVIDIA's PureVideo HD - full H.264 decode and partial VC-1 decode | |||||||
Reference cooler | dual-slot | single-slot | single-slot | dual-slot | dual-slot | dual-slot | dual-slot | single-slot | single-slot |
Retail price (default-clocked model) | £175 | £125 | £79 | £89 | £149** | £129** | £139 | £99 | £89 |
* calculated on a three FLOPS per clock cycle basis.
** based on NVIDIA's recent price-cuts. Current price is £175 for GTX and around £199 for overclocked GTX.
For a look at how the GeForce GTX 280, 260 and 9800 GX2 compare against the new ATI rivals, head on over to here.Analysis
The nine-GPU table, above, takes in the real volume-selling SKUs from
both companies. Priced at between £89 and £175 for
default-clocked models, they constitute graphics-card updates that most
can strive for, to play the latest games at reasonable resolutions and
image-quality settings.
The Radeon HD 4850 uses 2GHz-rated GDDR3 for 64GiB/s of bandwidth. That's up from the HD 3870 but down from the HD 3870's GDDR4. ATI reckons that the HD 4850 has roughly the same level of usable bandwidth as the HD 3870, for the reasons outlined on the previous page.
We're a little concerned that the HD 4850 remains lopsided from a bandwidth point of view, appreciating just how 'top-heavy' the design is. Surely an architecture like this would thrive on 100GiB/s+
Segueing nicely, equipped with crazy-speed GDDR5, the Radeon HD 4870 manages to put out 115GiB/s of juicy bandwidth, which is comfortably more than any other card in the sub-£200 sector. 3.6Gbps memory does have its uses, after all. The number is particularly staggering considering the 256-bit interface.
Transistor count is up near 1bn, yet die-space is smaller than the 65nm-based GeForce 9800 GTX. We already know that the 800 SPs and 40 texturing units take a vast proportion of these near-1bn transistors up.
The new GPUs' vital stats don't begin to look really scary until we come down to the shader and texturing counts.
Both feature 800 SPs that can dual-issue arithmetic commands. Knowing the core clockspeeds of 625MHz and 750MHz for the HD 4850 and HD 4870, respectively, we arrive at peak ALU rate of 1.0 and 1.2TFLOPS - the latter being a figure that's almost twice as high as the GeForce 9800 GTX.
The greater texture units dictate that bilinear (INT8) texturing filtering is impressive, but FP16 texturing and general fillrate isn't quite as good, down to the 16ppc processing from the ROPs.
Both Radeon GPUs consume more under-load power than the cards they replace, and that's why the HD 4870 ships with a dual-slot-taking cooler and twin six-pin PCIe power connectors, allowing partners to ramp it up higher. The 110W HD 4850, however, keeps a single-slot profile and solitary power connector.
Most reference-like cards will ship with 512MiB of on-board memory, be it GDDR3 or GDDR5. Partners are free to design custom SKUs with, say, 1GiB of memory - useful in instances where lots of texturing and image-enhancement is taking place.
We also expect to see partners launch factory-overclocked cards from the get-go, opening up an avenue for product differentiation. Along the same lines, you'll see partner-designed coolers on certain models, too.Summary
ATI has managed to fit an incredible amount of shading and texturing power into the new HD 48xx-series - more than we expected. That shading is helped along by a commensurate increase in texturing, and the use of GDDR5, on the Radeon HD 4870, means it has gobs of bandwidth, too.
If this was a specification-to-specification fight, it would be over before it started. ATI's new GPUs' visceral output cannot be matched by NVIDIA's mid-range, based on 18-month-old technology.
There's one thing in having a huge, huge engine, and another in being able to use it well. Historically, NVIDIA has enjoyed a huge advantage in ensuring that games developers optimise code for its architecture, through a better-supported dev-rel team. The upshot has been that any obvious shortfalls in on-paper specs have mitigated by tight, efficient code, much to the chagrin of ATI's engineers.
Whatever the current state of play, it's difficult to argue against the brute power of the new mid-range/enthusiast GPUs from ATI; they're comfortably ahead of anything else at the quoted price-points of £125 and £175 for the Radeon HD 4850 and HD 4870, respectively.