facebook rss twitter

Review: AMD Athlon 64 X2 4800+

by Ryszard Sommefeldt on 9 May 2005, 00:00

Tags: AMD (NYSE:AMD)

Quick Link: HEXUS.net/qabes

Add to My Vault: x

How does it work?

Good question. Compared to Intel's way of doing a dual-core x86 processor, which incidentally apparently doesn't advertise itself as having HTT (unless it's the $1000 Pentium 4 840 Extreme Edition), which will negate a possible performance improvement on applications that only run pairs of threads if the HTT feature flag is exposed, AMD do it a bit differently. And that's all down to the basic architecture they created for Opteron and Athlon 64 during its inception.

The two main things to think about are the integrated memory controller of Athlon 64 and Opteron processors, along with the HyperTransport bus that AMD use to connect pretty much every facet of a full Athlon 64 or Opteron system together.

Integrated memory controller

While there's two cores with a Athlon 64 X2 or dual-core Opteron, they share a memory controller on the die. With Intel's dual-core processors, the memory controller is off-die on the northbridge, at the far end of a bus that both cores share to talk to anything external to them. If one core issues a bus request for a memory access, the other has to wait for that to finish before it can do one of its own.

Athlon 64 X2 is different in that each core has an individual connection to the memory controller (it's not quite a bus in the CPU sense). If both cores want to access memory, even if that's the same memory location (as long as it's a read request), they can do so at the same time. There's no waiting or bus contention, just regular use of the memory controller as they would if they were singular cores in a non-X2 Athlon 64. The only sticking point lies in the simultaneous update of the same memory location, where arbitration must occur.

HyperTransport

Each core shares a HyperTransport link to the outside world, via a crossbar switch that interleaves bus traffic per core, so that the traffic goes to the right HT device the core is communicating with, and any data is delivered to the right core on the way back. It's one of the reasons you can just drop a Socket 939 X2 into your current mainboard (with the caveats outlined on the previous page).

Other things that glue everything together, including Cool 'n' Quiet and cache coherency

Looking at the cores at a high level, there's not much more to consider in letting you know how they work. Given that the first X2s are likely to share the same ~110W TDP, AMD have had to make sure Cool 'n' Quiet works properly with the X2. With two cores to adjust voltage and multiplier for, there's some logic on the X2 that ensures both cores are in locked clock step with each other, as the speeds are changed. So if software requests a FID change (frequency ID/multiplier), the X2 ensures both its cores change FID at the same time, in sync.

I had a chat with someone at AMD last year about power saving on dual-core, with the opportunity to just turn off the entire major pipe and possibly cache memory of one of the cores, when it's not in use (keeping the front end alive to wake the rest of the core up when needed). It seems like AMD might go in that direction in the future, but for the first X2s on the desktop, neither core is ever shut down for power reasons with Cool 'n' Quiet.

There's also cache coherency to talk about. Back in the days of the Athlon MP, AMD implemented the MOESI cache coherency protocol. MOESI stands for Modify, Owner, Exclusive, Shared, Invalid. Each of those is a state the caches in the system can occupy, depending on what's being done with them by the CPU cores. For example, say that core one updates some memory in its cache, before writing it back out to memory. Core two is always snooping the traffic to core one, and as it spots that happening it marks the caches as Modified, to indicate they're not coherent. In a MESI cache coherency scheme, without the Owner state, if core two wanted to read that memory, it would have to ask core one for it, which tells two to hang on a short while while it writes the data back out to main memory.

However, since Athlon MP, single core SMP Opteron and now dual-core Opteron and Athlon 64 X2 have used the Owner state. In the case above, Owner state allows core one to pass the data that core two wanted over the core-to-core interconnect and update the cache on the other CPU directly, without writing it back out to main memory, with the caches then marked as Shared. You can see how that would increase performance.

There's less latency when cache data needs to be updated, since you don't need two trips out to main memory, one per core, for a read and write to get the caches back in sync. It's worth noting that Intel's multi-processor Xeon systems currently implement the MESI protocol, so they do have to go out to main memory if cache data is marked Invalid or Modified. I'm not sure how Intel's dual-core processors operate in terms of cache coherency.

In a nutshell, if you don't want to wrap your head around cache coherency protocols, the X2 allows the individual caches of each core to be updated without a costly round trip of data into and out of main memory.

Cache coherency is one of the main problems to work around when building multi-processor architectures, and only gets harder to do if caches get bigger and you add more processing units to a multi-processor system. It's good to see AMD carry on the work they did with Athlon MP, in that respect.

Before I look at the X2's performance, a little bit on the physical parts of how the CPU is architected.