facebook rss twitter

Review: Intel Core 2 Extreme QX9650: it's clobberin' time!

by Tarinder Sandhu on 29 October 2007, 10:12

Tags: Core 2 Extreme QX9650, Intel (NASDAQ:INTC)

Quick Link: HEXUS.net/qakaa

Add to My Vault: x

Penryn, Yorkfield, 45nm


Intel's microprocessor strategy currently revolves around what it terms a 'tick-tock' model. This paradigm centres around an ongoing two-year cycle, with a shrink/derivative being introduced one year and a brand-new architecture that's based on the same manufacturing process launched in the following year. We know that 2007 brings Penryn: a shrink/derivative of Conroe/Kentsfield. 2008 will see the introduction of Nehalem - a brand-new architecture that's based on the same 45nm process which Penryn is currently churned out on.

Penryn - a bit of background on why 45nm is lovely

Segueing nicely, HEXUS reported on Intel's progression to 45nm technology at the back end of February 2007. Head on over to this link to give yourself a refresher.

45nm -why the need?

Serving as a quick recap, Intel announced that it was basing its evolution to the Core 2 microarchitecture -codenamed Penryn -on a 45nm manufacturing process that took advantage of a breakthrough in transistor design.

In a nutshell, the smaller manufacturing process uses a new process that replaces the traditional silicon-dioxide insulating layer between substrate and transistor with a hafnium-based high-k gate oxide that allows for a thicker (and better-insulating) layer to be used. This in turn leads to lower electrical leakage; a crucial requirement with ultra-small manufacturing processes. Additionally a new metal gate replaces the traditional polysilicon variant and provides for a better electromagnetic field, helping switching times - the process is therefore known as high-k metal gate.

The upshot? More CPUs per wafer, considerably less current leakage and a faster switching time, which translate to a more energy-efficient design that will have an innate propensity to clock higher.

Penryn additions to Intel Core microarchitecture

A smaller manufacturing process isn't all that's new to Penryn, though.

Let's trot out the old PDF foil and go through what's being bolted on to an already decent architecture.

First off, it's important to note that the Penryn is a complete family of processors, encompassing workstation/server, desktop, and mobile parts, so what's applicable to one sub-family is generally applicable across the board.

Penryn, as applicable to the desktop, refers to the next evolution of dual- and quad-core CPUs that are run off the present LGA775 form-factor - codenamed Wolfdale and Yorkfield, respectively.

Images courtesy of Intel.

Now, the Core microarchitecture's key performance-defining benefits are shown on the left-hand side. We've covered them in some detail previously.

Penryn's additions are shown on the right, so let's go through them and attempt to delineate their usefulness in improving performance.

Fast Radix-16 Divider

Penryn incorporates a new algorithm, Radix-16, for dividing instructions and commands at four bits at a time, compared to two bits for the incumbent Conroe/Kentsfield. The divide instruction is pervasive across applications, used in both floating point and integer calculations, so a double-fast algorithm adds some more juice to the CPU's computational speed.

Enhanced Intel Virtualisation Technology

More prevalent in the workstation/server community, Intel VT technology - where multiple, hardware-isolated partitions can run on the same machine - is boosted with a reduction in the time taken to transition between virtual machines on a purely hardware level. Intel quotes a boost of up to 75 per cent.

Larger and smarter cache

A large amount of on-chip cache is a good thing. The ability to load and locally store an application's working set is an effective, if transistor-costly, method of increasing performance, as on-chip cache speeds are an order of magnitude faster than accessing external memory on a regular basis.

In particular, dual-core desktop Penryns - code-named Wolfdale - will pack either 3MiB or 6MiB of L2 cache, and quad-core models - which represent two distinct Wolfdales on a single package and code-named Yorkfield - up to 12MiB. In transistor terms that's around 840m for a quad-core part; it's just as well Intel is packing them into a space-saving 45nm process, then.

Penryn's L2 cache is now endowed with 24-way associativity, compared to the 16-way associativity present in the Kentsfield core, so not only is it bigger, it should be more efficient at leveraging the benefits of greater L2 cache.

Split-load cache enhancement

Cache is cache, right? However, the effectiveness of cache is directly related to just how well data can be crammed into it. Should tags not correctly align with the cache line (too big, perhaps), which contains an index of what's in the cache, transfers to the execution core can be an inefficient process. Penryn has a split-load cache which, as the name suggests, is able to split the data and associated tags up to better fit into the cache's lines.

SSE4 and Super Shuffle Engine

SSE4 (Streaming SIMD Extension) adds a bunch of multimedia-related optimisations that will be manifested in a desktop environment by better media-encoding performance. SSE4 contains 54 new instructions, and 47 of these - termed SSE4.1 - will be available on the Penryn core. The remaining seven will debut with the upcoming Nehalem core.

Super Shuttle Engine sounds like a Japanese-esque nomenclature for an advancement that adds a 128-bit-wide shuffle unit. In plain English it's useful for a number of imaging and video programs that use what are termed shuffle-like operations such as pack, shift and unpack. It'll be interesting to put this to the test.

Deep Power-Down Technology and Enhanced Intel Dynamic Acceleration

The Intel Core 2 microarchitecture introduced enhanced power-saving states that gated the CPU down during idle periods. The DPDT is an extension that further pushes down energy requirements during, you guessed it, idle periods.

EIDA is an interesting inclusion. Should the current application be single-threaded, whereby there's no intrinsic advantage of having multiple cores working concurrently, EIDA pushes up the single-core frequency to above specifications. That could mean a 2.93GHz part auto-overclocking to, say, 3.2GHz. Sounds like a good bet for isolated gaming, where the majority of titles are still single-threaded. We note that both of these technologies primarily apply to the mobile space, where thermals and battery-life longevity are more pressing concerns.


We've trotted out a number of enhancements that Penryn possesses over and above current dual- and quad-core Core 2-based CPUs, but, really, they're architectural bolt-ons that, on a clock-for-clock basis, will provide somewhere in the region of five to 15 per cent extra performance. There's nothing radically new here, just as we suspected, and Penryn constitutes a natural progression for Core 2. The main talking point, we guess, has been the move from 65nm polysilicon-based architecture present in Conroe/Kentsfield processors to a 45nm high-k metal-gate that's present in Wolfdale/Yorkfield CPUs.