It seems only yesterday that we saw the arrival of the GF3 Ti500 GPU, the clock increased product refresh of the original GeForce3. That GPU broke new ground in terms of speed and functionality and it's programmable vertex and pixel shader set the stage for all graphics hardware to follow. 6 months later came the Ti500 from NVIDIA. ATi were also brave enough to release their own part, the R200 which has powered the new Radeon 8500 line of products. They themselves programmable and compliant with the DirectX 8 Pixel and Vertex Shader (in the case of the Radeon, compliance with the 8.1 version of DirectX although this is little used in reality).
Given NVIDIA's propensity for the spring refresh of their product line and the introduction of a completely new GPU, it would have been foolish not to have seen these cards coming. While there is a version of the GeForce4 (the MX series) based on the NV17 GPU, it's the powerhouse card and it's speed graded derivatives based on the NV25 that impress the most. It's the most powerful of the new cards that we take a look at today, the Ti4600.
If you take a look at the following table you'll see the 3 NV25 based cards and their speed grades, each progressively faster than the previous until you land at the top of the pile with the Ti4600.
GeForce4 Ti4200 (debuts in 8 to 10 weeks time)
225Mhz core clock (yet to be confirmed by NVIDIA)
500Mhz memory clock (250Mhz DDR, yet to be confirmed by NVIDIA)
275Mhz core clock
550Mhz memory clock (275Mhz DDR)
300Mhz core clock
650Mhz memory clock (325Mhz DDR)
All 3 cards share the same GPU and 128Mb of memory so lets take a closer look at the new technology behind the new processor.
The Ti4600 version we take a look at today is capable of some incredible theoretical performance figures.
4.8 billion fully anti aliased samples/sec fillrate
The GeForce3 by comparison manages a meagre 8GB/sec memory bandwidth and 3.84 billion anti aliases samples per second fillrate.
86 million triangles per second
10.4GB/sec memory bandwidth
1.23 trillion ops per second
The Technology behind NV25
Just like the GeForce3, the GeForce4 or NV25 is build on a 0.15 micron process by TMSC, a Taiwanese Semiconductor company that NVIDIA contracts to build the processors. NVIDIA is actually a fabless semiconducter company meaning they produce the designs, do all the testing and have the absolute last word but the actual fabrication isn't done by them. They garnered praise from their peers in the Fabless Semiconductor Association in December 2001 for their accomplishments throughout the year. High praise indeed which you can read about more in depth here.
The full feature list can be found here and I wont dwell on it. Pick over it at your leisure. We'll discuss the features NVIDIA is keen to push to the consumer and get you used to the new feature set without being bogged down in the details.
The 0.15 process allows them to squeeze 63 million seperate transistors onto the GPU die which incorporates all the new hardware features that set it aside from the GeForce3. It's those new features that NVIDIA are proud of and are what we'll take a closer look at. Also while the NV25 is a totally new GPU, think of it as a refinement of the NV20 rather than a revolutionary new design, an NV20 with even more muscles if you like.
Lightspeed Memory Architecture II
The first of the new features is an upgrade to the Lightspeed Memory Architecture that debuted on the GeForce3. Imaginatively titled LMA II, the new memory interface is resonsible for interfacing between the rest of the GPU and the ultra fast DDR memory and also provides hardware memory optimisation via a couple of processes. The first of those processes is Z-buffer compression. The Z-buffer is also known as the depth buffer and is responsible for holding the draw order of the pixels you see on your screen. The Z value of a pixel tells you how close or how far away from the viewer the pixel is to be drawn.
If you think of 2 pixels, both having the same X and Y coordinates but different Z or depth coordinates, you can see that one will be drawn behind the other when the final render to the front buffer is performed. The LMA II logic performs a 4:1 lossless compression of the Z-buffer to save memory bandwidth. Being a 4:1 lossless compression algorithm, the comression loses none of the original data and has the potential to increase memory bandwidth by a factor of 4 when reading to and from the depth buffer.
Thinking back to our two pixels with identical X and Y coordinates and different Z coordinates, the LMA II logic also performs occlusion culling on those two pixels and the rest of the pixels. It clears the Z-buffer of as many 'dead' pixels as it can by doing a test on the data in the buffer and removing pixels that will never be drawn. The GeForce3 did the same thing and the LMA II unit on the NV25 is NVIDIA's 2nd generation attempt at further improving things. This technique also saves crucial memory bandwidth.
Lastly on the LMA II unit you'll find Quad Cache, NVIDIA's name for 4 optimised cache memory areas that again increase effective memory bandwith much like the L1 and L2 caches on your main processor. They keep the later stages of the render pipeline fed with data and therefore operating as efficiently as possible.
nFinite FX II Engine
When the initial speculation about the NV25 first appeared, many guessed correctly that the nFinite FX engine on the GeForce3 would see an update for inclusion on the NV25. This is exactly what has happened. The engine now sports two upgraded vertex shader units versus the single unit on the GeForce3 and also an upgraded pixel shader engine that supports new pixel shader modes as defined by Microsoft. However the pixel shader unit isn't a full Pixel Shader 1.4 part like the R200 from ATi. In real world practise however the fact it doesn't support the entire 1.4 spec wont be a problem since the spec isn't targeted by much software in development.
The combined power of the 2 vertex shaders are estimated by NVIDIA to give more than 3 times the vertex processing power of the NV20 and with the upgraded pixel shader unit, more than 2 times the power of the old pixel shader unit.
NVIDIA have licensed their shader technology to Evans and Sutherland as part of a broad technology sharing agreement so the tech behind the new nFinite FX II engine should show up in products from the visualisation specialists in the future and NVIDIA hope it will be put to good use on the NV25.
So not the 2 new pixel and 2 new vertex shaders as some expected (including myself) but rather an upgraded pixel shader and a pair of upgraded vertex shaders and a new moniker for the whole thing.
Accuview is the name of the new anti aliasing logic on the new GPU and is a patent pending engine that implements a range of anti aliased modes for the pixels. It's a multi sample anti aliasing engine meaning that it samples the frame buffer multiple times and combines the samples in such a way as to provide an anti aliased output. There are the old favourite 2x, 4x and NVIDIA's Quincunx antialiasing and a new mode called 4XS. NVIDIA are claiming 3x the performance of competing AA implementations and are very confident that AA in some form can be enabled by default on ANY application you might run without running into performance difficulties.
nView is the name that NVIDIA has given to the technology behind their multi display implementation (previously called TwinView) and matches HydraVision on the ATi range of cards in terms of features. Some Ti44xx models will feature dual-DVI outputs and some, like the test card, will feature D-SUB, DVI and also S-Video/Compostive TV-Out.
There you have it, the 4 features that NVIDIA are pushing hard to the public with the NV25 and the features you'll be hearing a lot more about as the days and weeks roll by and the cards themselves start to appear in the various retail channels. In a nutshell we have a new pixel and vertex shader unit, upgraded memory controller, a new multi display implementation that builds on TwinView and the new anti aliasing logic. Combine that with increased clocks in the higher end versions and the GF4 starts to look formidable.
The Card Itself
The card itself is an NVIDIA reference design from NVIDIA themselves, not a proper card from one of NVIDIA's partners. Outfitted with 8 memory chips, 4 on the front and 4 on the back, each 16Mb in size and rated at 2.8ns they comprise by far the most expensive percentage of the card.
It's a fair bit larger than a regular GeForce3 card as you can see in the following shots. The top card is a Gainward GeForce3 Ti550 Golden Sample. The card uses a Connexant CX25871 for the TV-Output and its own onboard TMDS for the DVI and D-SUB outputs.
As you'd expect it's a 1.5V AGP device which means it won't work on older 3.3V boards.
Installation and Driver
The card arrived bare in an antistatic bag so it was a matter of getting hold of the 27.30 Detonator drivers for Windows XP and getting on with benchmarking. Actual physical installation was no different to installing any other card bar the excitement of installing the latest and greatest from NVIDIA and getting the first look at its speed. Watching 128.0Mb flash by on the card BIOS as the system powered up was another first being the first 128Mb card I've used.
The driver looks no different from the usual Detonator with the expection of the new nView tab for controlling the display devices attached to the card. In this case I was only able to test TV-Out and the regular D-SUB analogue output to a Sony G400 since we didn't have a DVI compatible display to hand during testing.
Here are a few shots of the driver property pages. Click on them for full versions in a new window.
Main Property Sheet
nView Property Sheet
Anti Aliasing Property Sheet
We'll investigate performance like we've done in the past few ways, both stock and overclocked but with both sets of information on the same graph. Where possible, we'll also show you a comparable GF3 Ti500 score from the same or similar system (using the 23.11 drivers).
As always, a quick run down on the test machine before we start so you have something to compare your own system with.
NVIDIA Reference GeForce4 Ti4600 128Mb w/2.8ns memory
To start with we'll use 3DMark 2001 Professional. This is a DirectX 8 powered application benchmark and is influenced by the whole system. In this case however we'll just be testing card performance. You notice a Ti500 reference result that I measured a couple of days previously on the same system and the Ti4600 result both stock and overclocked.
Intel Pentium 4 'Northwood' 1.8Ghz 512kb
Viarama PE11-SE/RAMA VIA P4X266A Socket 478 Motherboard
256Mb Crucial PC2100 DDR SDRAM CAS2.5 @ CAS2
Adaptec 39160 64-bit U160 SCSI Controller
2 x 73Gb Seagate Cheetah U160 10,000rpm Disks
Pioneer SCSI DVD 6x
Plextor 12/10/32S SCSI CDRW
Windows XP Professional Build 2600.xpclient.010817-1148
DetonatorXP 27.30 NVIDIA drivers
3DMark 2001 Professional
The GeForce4 results are the first 9000+ results we've seen at Hexus and are comfortably more than 1000 points from any stock result we've seen on any system to date. While it's not the 10,000 point score you'll see elsewhere it's correct for the hardware. We've been preaching for a long time that a score in the post 6000 range equals good performance in current games. What the 9000+ point score shows is that the system has plenty of headroom at these lower resolutions. It's about time with cards like this that we started tweaking for a 6000 point score at high resolution, even 1600x1200, something this card is easily capable of. When the power is there, like with the GeForce4, dont waste it away on a low resolution.
Next up we have Quake3, the first of our OpenGL benchmarks. You'll se 6 scores here, 2 for each resolution both stock and overclocked. Bear in mind this game engine is fairly old and anything post-GF2 can crank out a good Quake3 score.
The series' correspond to our test resolutions of 1600x1200, 1280x1024 and 1024x768 respectively. As you can see at the lower two resolutions, the cards clocks make very little difference. At those resolutions, the engine is CPU limited and the only way to see extra performance is the increase the CPU clock. In other words, the card is too fast for the 1.8Ghz Northwood. You also see the same behaviour and similar scores with the Ti500 in this test. Current accelerators have outgrown Quake3, even at high resolution. Excellent performance with all rendering features enabled is the order of the day in this benchmark with the engine doing nothing to cause the cards concern.
Next up, a favourite of mine and a better indication of performance on current and upcoming games software. Aquamark is a DX8-heavy benchmark based on the Aquanox shooter from Massive Development. The game is incredibly heavy on the DirectX Pixel and Vertex shader, the Vertex shader in particular from what I remember. The high end scores we've been seeing up to just at Hexus have been in the low 50's and we've commented before that a system that can do 60 frames per second out of the box will be something a bit out of the ordinary.
It didn't take anything out of the ordinary or tweaking of the base system at all. Out of the box, the lowly 1.8Ghz Northwood gave the card enough data to take it comfortably into the 60fps range for the first time here at Hexus. Increasing the card clocks give a minimal boost in performance and belies Aquamarks love of raw CPU horsepower for the extra boost. With a 2.2 Northwood or XP2000 on a fast base system you should see close to the 80fps range. We should see 100fps broken in Aquamark out of the box by the middle of the year. You can see the new nFinite FX II engine going to work here. The GeForce3 Ti500 in the same base system scores a full 10 frames per second slower here. That's a full 20% increase at 1024x768 and the gap would be much much higher at higher resolution. Here you can see the beginnings of a trend forming which we'll discuss at the end in the conclusion.
Next up, our final benchmark in the review, Serious Sam. A benchmark we haven't used before at Hexus so if the numbers look slightly off, you'll know why! We used the Karnak Peaceful Night Coop Demo which features in the game, the /dem_bprofile=1 console command to force demo stat generation and the Quality rendering setting. All that was changed from then on was the resolution and we ran at 1024x768, 1280x1024 and 1600x1200 as usual. It's OpenGL based and makes use of more advanced rendering features than Quake3.
Working our way down the graph we see the resolution results starting with the heaviest resolution first, 1600x1200. As we can see, the card overclocking didn't do very much for the scores suggesting we are fairly CPU limited at all resolutions. The card easily outpaces a Ti500 at the higher resolutions due to the increased clocks. Something interesting we noted during the Serious Sam testing was that declocking the card to Ti500 clocks (240/500) gave absolutely identical scores on the same CPU. The declocked card essentially was a Ti500 for all intents and purposes on the same CPU. I cant decide if the new NV25 core features weren't being used or whether we were completely CPU limited but something wasn't quite right. Increasing card clocks did increase the scores in Serious Sam so the CPU limited theory doesn't hold up. Not quite explainable just yet but it doesn't concur with some OpenGL weirdness we've been seeing with the card all day.
It's quite clear that the card raises the bar in the performance stakes on current systems. We quickly mentioned a trend occuring earlier and said we'd develop it here. What we noticed today during testing was that at the lower resolution of 1024x768, and to some extent 1280x1024 depending on benchmark, the Ti500 wasn't far off the scores of the stock Ti4600. This is down to a few reasons. Driver immaturity is the first. Remember that NVIDIA for the past few generations of cards have made a habit of releasing a driver that dramatically improves performance on a new card at a later date. Be sure that this will happen with the GeForce4.
We've seen OpenGL performance issues (not quite problems) all day during testing and it's a good certainty that a newer driver will increase performance significantly on this card in the future. The next reason is that the cards are too powerful for systems at the lower resolutions. They aren't being made to work as hard as they are at the higher resolutions and there is always a degree of CPU limitation at lower resolution.
At 1600x1200 and beyond the card pretty much demolishes the opposition cards. The higher clocks and core improvements (especially in the LMA II logic and the massive memory bandwidth savings) give the GPU huge muscle and let it play anything at 1600x1200x32 comfortably, something a Ti500 might stretch to do. This card was made for pushing polygons at massive speed at high resolution. If you want high res gaming, this is the card to aim for. High res means massive amounts of texture data so the 128Mb frame buffer begins to make a lot of sense.
A lot of people will dismiss the 128Mb framebuffer as a gimmick and to a certain extent on slower cards it is. But on the Ti4600, with its huge processing power that begs to be used at 1600x1200 and above, will get on famously with the double size framebuffer. Don't scoff until you've thought about it!
Performance is astonishing and NVIDIA raise the bar yet again. I can't wait for a driver update or two to fix my niggling issues with the card and for manufacturers to start releasing them in earnest. Gainward in particular have a couple of monsterous sounding versions that I'd love to look at.
Makes you wonder what ATi will do next that's for sure. Price is the only barrier as always with the top end cards but it looks like, for once, the cards will be keenly priced. For the midrange the Ti4200 will be the card to look out for in 8-10 weeks. The Ti500 and normal GeForce3 are being phased out with the Ti4400 and Ti4200 taking their places with the Ti4600 the flagship product. The Ti200 GeForce3 remains for mid range duties.
It's been a pleasure testing and NVIDIA will have a hard time getting it back =) Lastly, a big thanks to Spymaster for discussing the GeForce4 with me over the past couple of days and for furnishing me with Ti500 numbers and help with Serious Sam.
NVIDIA get an Editors choice for the technology used to create this beast.