vacancies advertise contact news tip The Vault
facebook rss twitter

Review: AMD's Dual-Core x75 Opterons

by Ryszard Sommefeldt on 21 April 2005, 00:00

Tags: AMD (NYSE:AMD)

Quick Link: HEXUS.net/qabdj

Add to My Vault: x

AMD's dual-core Opteron

Sharing a memory controller

I mentioned on the previous page that AMD's dual-core processor approach has both cores sharing the same die space. They've done so to allow both cores to share a memory controller, which is on the CPU with AMD's K8 generation of CPUs, of which dual-core Opteron is a member. Each core therefore has all the current benefits of AMD's on-die memory controller that a single-core processor does. So while available memory bandwidth doesn't increase when you add the second core, since there's only one memory controller, there's all the benefits of the low-latency controller available to both cores, keeping performance as high as possible.

Sharing HyperTransport links to the system and other CPUs

Every Opteron processor has one HyperTransport link that allows it to connect to devices in the system. Then, depending on the Opteron model, there may be one or two other HyperTransport links available for communicating with other processors in the system. A 1-series Opteron doesn't have any other links for communication with other Opterons in the system, since it's the single processor version. 2-series Opterons have one link, allowing them to connect to one other processor for a maximum of two in the same system. 8-series Opteron has two links, and depending on the topology employed by the mainboard they sit in, that allows you to connect up to eight Opterons together.

Dual-core Opteron doesn't change any of that, with the cores sharing those links via glue logic on the die. You can still place up to eight physical processor packages in a system with a dual-core 8-series Opteron, but the dual-core nature of the processors gives you sixteen processor cores to do work on. So it's not quite as optimal as having sixteen physical CPUs, each with its own memory controller, but it allows you to double processing power in any existing system that supports existing Opteron CPUs.

Cache coherency with MOESI

In any multi-processor system, the caches for each processor core need to be able to talk to each other to maintain coherency, should any processor in the system need data from the cache of any other CPU.

Back in the days of the Athlon MP, AMD implemented the MOESI cache coherency protocol. MOESI stands for Modify, Owner, Exclusive, Shared, Invalid. Each of those is a state the caches in the system can occupy, depending on what's being done with them by the CPU cores. For example, say that core one updates some memory in its cache, before writing it back out to memory. Core two is always snooping the traffic to core one, and as it spots that happening it marks the caches as Modified, to indicate they're not coherent. In a MESI cache coherency scheme, without the Owner state, if core two wanted to read that memory, it would have to ask core one for it, which tells two to hang on a short while while it writes the data back out to main memory.

However, since Athlon MP, single core SMP Opteron, and now dual-core Opteron, has used the Owner state. In the case above, Owner state allows core one to pass the data that core two wanted over the core-to-core interconnect and update the cache on the other CPU directly, without writing it back out to main memory, with the caches then marked as Shared. You can see how that would increase performance.

There's less latency when cache data needs to be updated, since you don't need two trips out to main memory, one per core, for a read and write to get the caches back in sync. It's worth noting that Intel's multi-processor Xeon systems currently implement the MESI protocol, so they do have to go out to main memory if cache data is marked Invalid or Modified.

So there's a fast core-to-core link that allows the cores in any dual-core Opteron system, even one with multiple processors, to update each others caches as fast as possible, with little latency. If the caches to be updates reside on separate physical packages, the cache updated are conducted over HyperTransport. The important thing to keep in mind is that they don't need to hit main memory to do so, unlike current Xeon and dual-core Pentium D and Extreme Edition.

HyperThreading

If you've kept an eye on Intel's Pentium 4 or Xeon processors since late 2002, when Intel launched the 3.06GHz Pentium 4, you'll know about HyperThreading. HyperThreading is the ability for Intel's Netburst processor architecture to run two threads of execution concurrently on one processor core. By duplicating the front-end logic of Netburst, two threads can be run on the CPU in tandem, boosting performance, provided some caveats aren't hit. Since the two threads are sharing the processor's exeecution units and the cache memory, if either thread wants to use the resources or cache that the other is using, the pipeline of the CPU stalls and performance drops off.

However, HyperThreading laid the groundwork for application and OS vendors to support multi-threading. Dual-core Opteron exploits that by advertising itself as being a HyperThreading processor, so that any HT-aware OS or application uses the two cores on a dual-core Opteron as it would a HyperThreaded Intel processor, to boost performance. Since there's no cache or execution resources to share, performance never need drop off and while properly HT-aware applications split their threads up in a way to minimise using execution resources that the other thread is using, there's still a speedup to be had with Opteron, even if each thread isn't doing the same work on each core. The simple fact that there are two cores helps speed things up regardless.

Summary

So it's two full Opteron cores, each with their own L1 and L2 caches, sharing a memory controller and HyperTransport links to the rest of the system and other CPUs, while advertising itself as supporting HT technology in order to take advantage of the software investment made for Intel's multi-threading-on-a-single-core technology. Pretty simple. Let's see how that works out in a physical sense.