facebook rss twitter

Review: AMD Opteron

by Ryszard Sommefeldt on 23 September 2003, 00:00

Tags: AMD (NYSE:AMD)

Quick Link: HEXUS.net/qats

Add to My Vault: x

x86-64 II

Many technology commentators and software engineers believe the main limitation present in the current x86 ISA today, is the lack of general purpose registers available to software. With the CPU's general purpose registers responsible for holding in-flight data (memory addresses, data for the calculation) that the CPU is processing at any given time, it's easy to invisage situations where more registers would be beneficial.

With the current setup of 8 32-bit general purpose registers on x86, AMD have extended that in the logical fashion, making each of the eight existing GPR's 64 bits wide. It's worth pointing out now that current x86 implementations don't solely have 8 processor registers for which all code must pass through while executing. Current processors often have many more 'hidden' internal registers of the same width as the eight GPR's, that they use internally to store and process data that they are operating on. For example, according to an article on Ars Technica, the Pentium 4 has 128 such internal 32-bit GPR's, but according to the ISA specification, only 8 are visible to the compiler or software program.

The CPU does its own internal mapping and translating to make use of the extra GPR's, but they are never exposed to the programmer or compiler. Think of it as subtle hardware optimisation of your code, on the fly, post compile, as it's executed.

In the move to x86-64, AMD have taken the opportunity to extend the 8 GPR's to 64 bits wide, but also give the compiler or software programmer direct access to 8 more 64-bit GPR's. This only occurs in 64-bit operating mode (more on the operating modes of Opteron later), while executing 32-bit code the extra GPR's aren't visible. It's possible that the processor uses the extra 64-bit mode visible GPR's in 32-bit mode, in the same way that the Pentium 4 uses its own internal GPR's transparent to the user, but that's undocumented as far as I can tell.

Of course the Opteron also uses the same register renaming and translation tricks to perform the same "on the fly, post compile" optimisation of your code as it executes, using its non-visible internal GPR's.

As far as SSE/SSE2 code is concerned, AMD have also increased the number of software visible general SIMD (single instruction multiple data) registers from 8 to 16, but again, only in 64-bit mode. No real performance increase can be gained from the extra registers in 32-bit mode, it's performance increase in that mode comes from elsewhere.

So in the extension to the x86 ISA, AMD have given software engineers and compilers access to double width registers and have provided twice as much of them visibly. 16 GPR's and 16 SIMD GPR's visible in 64-bit mode.

That leads on nicely to the role that software will play in the performance of the new ISA. With the visible register count increasing, it's up to the compiler of your code, or the coder themselves writing directly to the CPU's registers, to make efficient use of them.

An unoptimised compiler may make inefficient use of the new execution resources on the processor, or simply ignore them altogether. To extract the most performance from the new processor and its implemented ISA, the compiler and software engineer must be fully aware of what exists for them to use. AMD themselves have been wholly active in making sure the target software development environments that would be executing on the new CPU's, are fully optimised and aware of the new execution resources available to them. Indeed, a full x86-64 hardware simulator was available long before functional silicon, so that software developers could test and optimise code for use on x86-64 functional implementations like Opteron.

They were very keen that software support for their new processors was as comprehensive as possible for the Opteron launch. This doesn't mean that the consumer launch of x86-64 was put on the back burner in terms of software support, but it's clear that an x86-64 aware version of the most popular consumer operating system, Windows, won't be available for the consumer x86-64 chip launch.

However, it's also clear that software support for the new ISA and its implementations is what will drive the success of the processor eventually, especially in the enterprise arena that Opteron will compete in. For the consumer space, it's less important, they just want the fastest 32-bit performer on the market, with any 64-bit based advantage a welcome bonus, but not a prohibiting factor to any success. The requirement for 64-bit software on the desktop is a few years away at best, in terms of the majority of required software needing a 64-bit processor to run at any kind of speed.

But in the server space, where 64-bit applications are much more pervasive, and the real reason why Opteron was created, software support is paramount. I'll talk more about that later. Let's finish up the talk about the hardware implementation of x86-64 in Opteron with a look at the other chip features.