IntroductionIntel Pentium 4 3.4GHz Prescott
Intel's performance champion was supposed to be the all-new Pentium 4 Prescott Core, right?. That was the idea, but ideas often aren't realised. Intel released the long-awaited Prescott core around two months ago to a muted press response. What was meant to show the Northwood core a clean pair of performance heels often lagged behind in a variety of benchmarks. It's not often that an incumbent is considered faster than the new arrival, but that's how it was and is.
A brief look at why the Prescott may not be a current performance goliath is necessary, I feel. The Prescott core is geared towards achieving high clock rates. That really does seems to be its primary aim. Intel once stated that the Pentium 4 architecture could scale to ~10GHz, yet enough has changed between Northwood and Prescott cores to cast doubt on the validity of that claim. For one thing, the new Prescott is the beneficiary, and I say that in the loosest possible sense, of a 31-stage integer pipeline; the Northwood's is 20. Increasing the pipeline length infers that each stage to do less work per clock cycle, or a lower IPC. That should also mean, ceteris paribus, the pipeline is more resilient to overall clock speed increases. When evaluating the Prescott in the context of a similar speed 20-stage Northwood, the former, without due consideration of other benefits, will undoubtedly appear slower.
Not all is lost, though, Intel's also realised that the Prescott cannot just be about pipeline length. In order to bolster the CPU's performance, Intel's decided to use an old performance trick, that is, add in more on-chip cache. Prescott doubles the Northwood's 512kb for more on-die storage (read better performance). Just look what happened to the Northwood when 2MB of L3 cache was added. It rose in price by about 250% and in performance by up to 30%. L1 cache is bumped up in size and associativity, which all help in keeping the processor busy executing instructions.
To somewhat counteract the longer pipeline and increased hit on performance resulting from cache misses, Intel's also improved general prefetching (loading up the right data) and enhanced that lovely Hyper-Threading. Another effort in ensuring the Prescott, under certain circumstances, does more than the Northwood is by adding SSE3 support. SIMD (Single Instruction Multiple Data) optimisation can reap enormous benefits if coded for correctly. Simplifying it down, having one instruction operate concurrently on multiple data saves time and raises efficiency. Streaming SIMD Extensions 3 (SSE3) adds 13 more instructions to SSE2's set. Extremely useful if done correctly.
The Prescott gives up some IPC performance in the name of clock speed but a whole host of other core improvements, including the jump down to 90-nanometre manufacturing, attempt to salvage what could have been a lost cause. It gives with one hand, it takes with the other.