Pentium 4 Northwood 2.4GHz
Pentium 4 Northwood 2.4GHz Review
Another month passes and we hear another announcement from Intel with respect to their Pentium 4 processors. It seems like only yesterday when the new 0.13u 'Northwood' processors were officially announced. Now we have before us the latest and greatest Northwood in the form of the 2.40GHz Northwood Pentium 4 processor.
For those of you who need to brush up a little on their Pentium 4 knowledge, or for those of you who are potential P4 owners, let's take a moment and reflect on the Pentium 4's origins. This should give us a valuable insight into how Intel have managed to release a x86 processor capable of running at 2.4GHz at a scant 1.5v.
The Pentium 4 was born from the need to replace the ageing Pentium 3 architecture, one that was quickly coming to the end of its scalability at 1Ghz. Remember the debacle surrounding the ill-fated 1.13GHz P3 ?.
Intel quickly realised that another approach was needed if the P3 replacement was to scale effectively. Thus, the P4 employs a group of features known under the umbrella term as Intel® NetBurst™ Micro-architecture.
One of the major constituents of NetBurst is hyper-pipelined technology. The P3 and Athlon processors have a 10 stage pipeline, the P4 on the other hand is blessed with an extra-deep 20 stage pipeline. What this effectively means is that the P4 is able to ramp up clock speeds rather easily, that is one reason we now see a desktop 2.4GHz processor today.
The downside of a 20-stage pipeline is that it inherently accomplishes less work per clock cycle when compared with a shorter pipeline. This is due the narrowness and length of pipeline, as any given instruction will take longer to reach completion. For argument's sake, if we say that the Athlon XP and P4 can push an instruction through 4 stages of their respective pipelines per clock cycle, it would take the P4 twice as long to process the same number of instructions as the Athlon XP (due to its double-length pipeline), or would need to be at clocked at twice the basic clock speed as the Athlon XP. This is purely for illustrative purposes, our benchmarks show that the Athlon accomplishes roughly 40% more work than the P4 on a clock-for-clock basis.
Other benefits under the heading of Netburst include the use of a 100FSB quad-pumped to 400FSB between CPU and memory controller, giving a potential 3.2GB/s bandwidth, the highest of any desktop PC. Level 1 Execution Trace Cache is also included in the form of 12k decoded micro-ops. This increases performance by removing the decoder from the main execution loop, and also helps reduce the time required to recover from branch mis-prediction,.
Branch mis-prediction occurs when the CPU makes a mistake with which way a branch is supposed to go. It then needs to 'flush' the instructions from its pipeline before it can start processing them again on the correct program branch. As you can no doubt gather, branch mis-prediction is especially costly in a deep pipeline as it takes 20 clock cycles to recover from any error. This is why Intel's engineers have spent a vast amount of time, effort and transistors in ensuring that instructions are mis-predicted as little as possible.
The P4 also boasts two arithmetic logic units (ALUs) that run at twice the speed of the core processor, so 4.8GHz in this case. These ALU handle the basic arithmetic duties such as add and subtract. These units are kept busy by the tremendous flow of data afforded by the quad-pumped FSB.
The Northwood Pentium 4's are differentiated from their Willamette counterparts by firstly being manufactured on a smaller die, 0.13u compared to 0.18u for the Willamette. The smaller die allows smaller transistors, which in turn produce less heat. This culminates in Intel having the ability of lowering operating voltage from 1.75v to 1.5v.
The extra room allowed by smaller transistors has been put to good use, as Intel have wisely raised the on-die advanced transfer cache (L2) from 256kb to 512kb. The extra cache helps keep the processor saturated with data, and should help in ensuring that the massive bandwidth on offer is effectively utilised by the CPU. For better conductivity, Intel have finally switched over to using copper interconnects, replacing the not so efficient aluminium interconnects found on Willamettes. We must note that AMD incorporated this manufacturing change some time ago.
The deep pipeline and 0.13u manufacturing process have ensured that Intel feel confident about releasing a processor at a clock speed of 2.4GHz, operating at only 1.5v, an impressive feat of engineering. Now that we have gleaned some idea of how a Pentium 4 operates, and how it has been able to hit such high speeds with relative ease, let's now focus on the particular processor for review.
The 2.4GHz Northwood Pentium 4 is the latest in the line of Northwood processors, the current range spanning from 1.6GHz to 2.4GHz in 200MHz increments. It shares the same microPGA form factor as its brethren, indeed the same form factor as any S478 P4. The only method of determining the exact speed of the processor is to look at the marking on the slug. Our particular sample was an engineering model, one whose actual clock speed was not displayed on the integrated heat spreader. Our processor's speed is derived from the fact that it sports a mammoth 24x multiplier coupled with the as yet standard 100FSB, how long before we see the Northwood 'B' incarnations, running on a 133FSB (533FSB QDR) bus ?.
CPU Specifications in detail
The Pentium 4 is currently blessed with at least five viable motherboard platforms. For performance reasons, we can rule out Intel's own I845 platform, that leaves us with a choice of either Intel's flagship I850, their promising I845D, SiS' excellent 645 chip set and Via's enhanced P4X266A. After preliminary testing, we decided to conduct our suite of benchmarks with the Intel I850 platform in the guise of the Abit TH7II-RAID motherboard. Intel have naturally touted the RAMBUS-equipped I850 Tehama chip set as their preferred performance solution. It's amongst the fastest performing at stock speeds, and its memory bandwidth scales well when the CPU is pushed beyond its rated speed.
We've also limited the benchmark comparison to AMD's current flagship processor, the XP2100, running on a Via KT333 chip set motherboard. The reasoning is simple. Potential owners of either CPU are looking for nothing other than maximum performance, these two respective setups are best placed to give them what they desire. On a side issue, motherboards may need a BIOS update to correctly identify the new processor.
All benchmarks were conducted at 1024x768x32 100Hz with vertical sync' disabled. Benchmarks were run 3 times consecutively, an average score was taken. Both systems were configured for maximum performance. A fresh installation of Windows XP was used in both instances.
We were very intrigued to see just how efficient Intel's 0.13u manufacturing process and yields had recently become. The very fact that they were confident enough to release a CPU at a speed of 200MHz greater than its immediate predecessor, the 2.2GHz NW P4, gave us initial hope.
Our expectations were duly fulfilled when we started raising the FSB (remember, all P4's are multiplier-locked). We eventually managed a rock-solid 2891MHz / 120.46 FSB with an under-load voltage of 1.67v (set to 1.75v in BIOS). Our FSB was actually set at 120 in BIOS, the TH7II slightly overclocks the FSB at any given speed. It must be noted that we managed to gain 1.75v voltage by using a simple voltage regular modification in the form of a removable micro-grabber. Our TH7II BIOS only gave us the option of using a maximum 1.625v. We felt that an increasing number of 'boards offered at least 1.75v, without modification, to P4 NWs, so it was only fair that we applied at least that level of voltage to our test CPU
At speeds greater than 2891MHz, we would run the risk of the CPU throttling due to unacceptably high temperatures. We're adamant that extra voltage, coupled with better cooling, would have seen us stable at the magical 3GHz barrier.
Perhaps even more pleasing than the ultimate overclock was the fact that we could reach 2600MHz at default CPU voltage, this speaks volumes for Intel's manufacturing process. The deep-pipeline that we alluded to earlier, really does show its usefulness at higher clock speeds. Here is a WCPUID shot that shows our overclocked speed.
We'll include the P4's overclocked benchmarks, just like we included the AMD Athlon XP's overclocked benchmarks in its review. Your overclocking mileage may vary, at least you may glean an idea of what kind of numbers you should expect to get at around 2900MHz. For the sake of brevity, we'll refer to our overclocked 2.4GHz P4 as a 2.9GHz CPU.
As is our customary tradition, we'll start the suite of benchmarks with SiSoft's synthetic yet useful benchmarking tool, Sandra. Firstly, let's examine how Sandra sees our new CPU in relation to its immediate competition.
Here is our first surprise. Even the Pentium 4, clocked at 2400MHz, cannot surpass the benchmarks laid down the by Athlon XP2100, one that has a true running frequency of 1733MHz. The explanation for these seemingly erroneous results lies in our discussion regarding pipeline length and work done per cycle.
The Pentium 4's deep pipeline makes it difficult for CPU instructions to pass through the complete pipeline quickly, as they have 20 stages to navigate before completion. The Athlon's shorter pipeline ensures that even if the same amount of work is done per cycle, the instruction will be ultimately processed more quickly.
The sheer, mammoth clock speed of the P4 at 2.9GHz overcomes its pipeline deficiency, it outdistances the Athlon XP comfortably. The P4's drystone benchmark is impressive, however.
Let's now focus on memory bandwidth, something that is crucial in keeping the CPU fed with data. The greater the memory bandwidth, the more efficient a processor can become.
If you cast your mind back to our little P4 discussion, you'll remember us talking about the Pentium 4's unique quad-pumped FSB. The bus effectively gives the P4 a 400MHz-capable path, or 3.2GB/s, from CPU to memory controller. Contrast this with the double-pumped Athlon FSB, one that can only deliver 266MHz, or 2.1GB/s. The fruits of the quad-pumped bus are evident from out memory benchmarks.
The P4 takes a commanding lead at default speeds and simply extends that lead when overclocked to 2.9GHz. Although the P4's lead looks impressive, it is actually quite inefficient at extracting potential bandwidth from its RAMBUS memory system. We only manage to extract 78% efficiency with RAMBUS compared to the Athlon's impressive 96% utilisation from DDR, albeit both in a buffered state.
We've seen the Athlon XP defeat the default 2.4GHz NW in the synthetic benchmark and we've seen the NW gain revenge in the memory benchmarks. As pure CPU speed and memory subsystem bandwidth largely dictate performance, we should be in for a close battle. Let's investigate how we fare in practical benchmarks.
As usual, our first practical benchmark is Pifast. For those of you who don't yet know, Pifast calculates the constant Pi to X million decimal places using the fastest method possible. We've previously seen that it's quite responsive to changes in the CPU and memory subsystem employed. Should be interesting.
Our theoretical data predicted a close result, that is exactly what we got. The P4's extra memory bandwidth quite overcome the impressive floating point power of the Athlon XP. The results at stock speeds are within 2% of each other. The sheer muscle of the overclocked P4 puts daylight between itself and the Athlon, impressive.
Let's turn our attention to MP3 encoding. We know that many of you are keen to 'rip' your personal CD collection onto your hard drive to listen to in MP3 format. Here were encoding a custom 481MB WAV file into MP3 128 kb/s format. MP3 encoding has historically been a CPU-intensive activity with little regard for memory bandwidth. Let's see if this still holds true.
The Pentium 4's enhanced bandwidth seems to play no part in this test. The Athlon XP, with its superior FPU, comfortably beats out the stock P4. LAME has always been an activity that the XP has excelled in, no change there. Again, the sheer might of the overclocked P4 outpaces the XP by brute MHz. Rather obviously, 110 seconds is the fastest time we've ever seen for this benchmark.
If you've ever seen Intel's processor advertisements, you'll know that they quite strongly focus on its ability at all thing media related. Let's see if we can substantiate their claims of it being a media maestro.
We're using Xmpeg 2.0, a derivative of the popular Flask encoder, coupled with the Div X 3.20 codec. We've found this combination to be the most stable in our stress tests, although we are planning on using the recently released DivX 5.0 codec in the very near future.
Three Kings is the DVD of choice, its mixture of action and dialogue make it an excellent benchmarking test. The DVD is encoded in full-screen format into YUV2 format. The black borders are cropped to save unnecessary encoding time. The lo-motion codec is used with the bit rate set to 1500 KB/s. We calculate the average FPS after 20,000 frames have been encoded.
It seems as if Intel's claims are not totally without justification. The P4, at its native 2.4GHz, manages to outpace the XP 2100 by just a shade under 5 frames per second. a difference that would shave just over 14 minutes of encoding time on a 2 hour movie. The Pentium 4, at 2.9GHz, simply blazes a trail that no single-CPU system can compare with. If DVD encoding is your hobby, and you don't have access to a dual-CPU machine, the Intel Pentium 4 is the next best thing.
We next ran the Ocuk SETI (Search for Extra Terrestrial Intelligence) benchmark, a rather tough work-unit with an angle ratio (AR) of 0.417. This one takes a while to complete, as it sifts through huge chunks of data in the hope of finding some inkling of Extra Terrestrial existence. One advantage in this benchmark is its ability to display results to within 1/10000th of a second, we've rounded the results up to the nearest second for the sake of brevity.
We knew it was going to be close, we didn't realise just how close. Imagine our surprise when we found that the P4 2.4GHz trailed the Athlon XP 2100 by 33 seconds over the course of a benchmark that lasts over 3 hours, talk about close !. SETI is a activity that truly revels with bandwidth. It appears as if the superior pure CPU power is almost perfectly allayed by the superior memory throughput of the P4.
We're finding it increasingly difficult to distinguish these two processors when run at their native speeds. It seems that they bring something to the table that is slightly lacking in the other. The awesome power of the 2.9GHz P4, benefiting from both an increased clock and memory speed, is there for all to see, quite simply the fastest time we've ever seen for this particular benchmark.
A new addition to the benchmarking-suite is recently released PCMark 2002 from MadOnion. It consists of a series of tests that represent common tasks in home and office programs. The benchmark is split into three constituent parts which focus on the CPU, memory and hard drive(s) respectively.
PCMark 2002 seeks to do for home and office applications benchmarking what 3DMark does for video card benchmarking. You're given a final score for each part mentioned above.
It's quite evident that PCMark shows a particular liking to Pentium 4 processors. We ran the benchmark a number of times and came up with very similar results each time. The only consistent factor between the benchmarks are the hard drive scores, this is to be expected as hard drives naturally aren't very responsive to changes in CPU and / or memory.
We're somewhat mystified by the Athlon XP's relatively poor showing in this benchmark, it seems to contradict our other benchmarks. We'd be interested to hear how other Athlon XP owners got on with this particular test. This benchmark involves everyday activities such as JPEG compression, MPEG encoding, text analysis etc, something the Athlon should be proficient at. We'd have fully expected the Athlon XP to keep some sort of parity with the P4 2.4GHz, puzzling to say the least.
Our benchmarks thus far have highlighted the Athlon XP's impressive work per clock cycle, it certainly packs a lot of punch per MHz. The P4 on the other hand, shows that at 2.4GHz, it can more or less stand toe-to-toe with the Athlon XP. Its slightly inferior work, due in part to its narrow and deep pipeline, is adequately offset by its superior memory throughput (courtesy of effectively a 400MHz bus). Let's now focus on gaming and see if our theories hold true.
We'll start off with 3DMark 2000, a synthetic benchmark from the folk over at Madonion. Although being a Direct X 7 driven benchmark, we feel that its importance in gauging gaming performance is just as relevant today, after all, many popular titles utilise DX7. The benchmark was conducted at its default resolution setting of 1024x768x16.
Here's a slight surprise. The Athlon XP manages to pull away from the P4 2.4GHz by almost 500 marks, a not inconsiderable amount. We'd have expected it to be closer. This was tempered by the prior knowledge that the P4 has historically been relatively poor at 3DMark 2000, indeed, only since the inception of the cache-enhanced Northwood P4, have we seen it get close to the scores posted by AMD's XP processors. We can hazard that 3DMark isn't too dependent on memory bandwidth, relying more on pure CPU speed. By now it should come as no surprise to see the P4 2.9GHz headed the charts once again.
Let's now move on to the DX8.1 compliant 3DMark 2001SE, a benchmark characterised by its reliance on both CPU speed and memory bandwidth. It's more complex tests place a burden on the memory subsystem to a greater degree.
Our results from 3DMark 2000 seem to be almost mirrored in 3DMark 2001SE. This time the P4 manages to close the gap somewhat, but can't quite bridge it. We're slightly perplexed as to why the P4 isn't closer to the Athlon in this of all benchmarks, we feel that the Detonator XP 23.12 drivers used for the Ti500 are perhaps more partial towards the Athlon XP. You can, by now, guess who is going to be leading our benchmarks.
Let's now turn out attention to a game that has been grabbing our attention of late, Serious Sam 2, The Second Encounter. The superb visuals, coupled with an excellent benchmarking mode, means that it is a joy to use. We're using the highly CPU / memory dependent Valley of the Jaguar timedemo. This should show us how well each subsystem is able to service the graphics card. Even our benchmarking settings of 1024x768x32 (Normal preferences) can be considered to be reasonably CPU / memory limited.
This came as something of a surprise. The previously dominant Athlon XP is upstaged by a default clocked P4 for the first time in our tests. The margin is literally negligible but consistent. The greater memory bandwidth of the P4 is put to practical use here, the 512kb of on-board L2 cache ensures that the available bandwidth is translated into tangible performance by the CPU.
Although, by now, it doesn't need reiterating, we can't help but be impressed by the spectacular performance of the overclocked Pentium 4. The debilitating effects of a deep pipeline are negated by sheer MHz.
Could we ever complete a CPU review without visiting the venerable Quake 3 ?, surely not. Quake 3, even today, remains one of the most consistent benchmarks available. Very few benchmarks have stood the test of time as well as ID's excellent first-person shooter. We'll run the benchmark modes in 512 fastest and 1024 quality. Point release v1.30 was used in both instances.
And now 1024 Quality setting, one that is more likely to be used in real gameplay.
If you know your Quake 3 benchmarks, you'll already know that the Pentium 4 is the king-of-the-hill at this test. The fastest P4 ever released simply reinforces that view. The 2.9GHz P4 demolishes any previous records. It must be noted that the P4's lead over the Athlon XP is largely immaterial, I'd challenge anyone to discern the difference between 232 and 242 FPS respectively. Today's advanced graphics cards and powerful processors simply relegate Quake 3 to almost synthetic benchmark status, such is their prowess in this test. 384 FPS at 512 fastest is slightly mind-numbing, though.
The Pentium 4 2.4GHz Northwood has proved itself to be an excellent processor. We all knew that it would only be truly effective against the current line up of Athlon XP processors once clock speeds were raised to 2GHz and beyond. Intel have deliberately designed a processor that will ensure no radical architectural modifications are needed in the near future. We also knew that the extra-deep pipeline would scale well, the reduction to a 13 nanometer manufacturing process has further ensured that relatively high clock speeds are attainable without undue difficulty. That is how we today have a x86 processor capable of 2.4GHz with effectively 1.4v under load.
The fact that our sample, one manufactured over a month ago, could hit almost 2.6GHz with default voltage, speaks volumes for Intel's fabrication efficiency. The Athlon XP, in its present format, is close to nearing its frequency headroom. AMD's last iteration of XP's, the 1733MHz XP2100, only increased its immediate predecessor's clock speed by a shade under 4%. The Pentium 4 2.4GHz is over 9% faster than the 2.2GHz CPU that it displaces as Intel's flagship desktop processor. In recent days we've heard talk of Intel reducing the core size from the present 146mm² to 131mm², this should help to reduce manufacturing costs and perhaps improve yields further.
Our benchmarks have shown that 2.4GHz CPU's performance is roughly comparable to AMD's XP2100. The slight lack of comparative CPU throughput is mostly offset by the increased memory bandwidth on offer, courtesy of its quad-pumped front side bus. Although we've previously hinted at the fact that its quad-pumped FSB is one of its major assets, we still feel that the imminent move from a 100 to a 133FSB (533 QDR) will really show the true worth of the Pentium 4, especially if paired with the upcoming I850E / PC1066 RAMBUS solutions.
Can we whole-heartedly recommend the Pentium 4 2.4GHz processor ?, the answer is probably no. The adoption of the newest technology is an inherently expensive business. We expect the 2.4GHz CPU to retail at around £550, a figure that is prohibitively high for the majority of people. We've illustrated that the XP2100 from AMD offers comparable performance at stock speeds, it currently retails for around 50% of the proposed P4 2.4GHz's price. We've mentioned that the 133FSB equipped Northwood 'B's are just around the corner, the provisional date for launching of the 2.53GHz / 133 FSB P4 being brought forward to May 6. That's the one we're really looking forward to.
We can't help but feel that the introduction of the 2.4GHz P4 is merely a stop-gap between Northwood revisions, and simply launched to counteract the excellent performance of AMD's XP2100. Yields appear to have sufficiently improved for Intel to manufacture 2.4GHz CPUs with the minimum of fuss. We couldn't let our 2.4GHz P4 off without finding out just how far it could go. We placed it on a DDR motherboard, raised the voltage to 1.85v, and were mildly shocked to see it comfortably surpass the 3GHz barrier. The following WCPUID shot isn't one of a stable system, sure looks impressive, though.
Both Intel and AMD have still to play their trump cards. Intel will shortly be moving on to an official 133 FSB platform for their P4 processors. AMD will be hoping to dramatically increase their processor's clock speed by moving down to a 13 nanometer manufacturing process. We should see the first of these Athlon 'Thoroughbreds' in the very near future. We'll call the present fight a draw, ding ding, onto round 2.