March 2, 2017, is an important day for the PC industry. It marks the re-emergence of AMD as a manufacturer of high-performance CPUs. This renaissance is down to a processor that you have heard much about during febrile rumours and activity in the preceding weeks. That processor is Ryzen.
Such an opening statement is not hyperbole. Intel, through its excellent Core microarchitecture, has had almost unfettered access to the premium desktop and lucrative workstation and server spaces for more than half a decade. Once strong in both areas, if your memory is long enough, AMD, arguably through a lack of focus and therefore poor architectures, has seen its performance parity eroded and market share decimated. This remained the status quo at the turn of this year.
Ryzen is the consumer brand name given to the first Zen-based microprocessor. Other, related processors will follow, such as Naples for the workstation and as-yet-unnamed offerings for laptops and fanless computers. AMD is betting a large part of the house on Zen being able to stand toe-to-toe with Intel's present and upcoming CPU architectures in every space, so it simply has to succeed in order to give AMD the much-needed platform on which to bring enhanced iterations of Zen in years to come. Zen, or Ryzen, is therefore an understandably big deal.
The brains behind Ryzen
We'll focus on the architecture behind Ryzen first. It is important to do so because this 'clean-sheet' design will be around in some form for a number of years. AMD hopes Zen will do for it what the impressive Core architecture has done for Intel.
There's very good reason for a ground-up design that borrows very little from the previous generation; the incumbent wasn't nearly good enough, and spending significant resource redesigning it wasn't deemed the correct way forward. Intel's Core was, and remains, faster, more efficient, more elegant, and just about better in any metric that truly mattered, other than price, than Bulldozer/Excavator - the architecture brains behind the cores used by AMD until Zen came along.
AMD says it has spent the best part of five years and over two million engineering man-hours on Zen, hoping it would meet the moving Intel target years from inception. Let's examine just how well AMD has done.
Front-end: Branch Prediction
Starting at the top of the core engine, the branch-prediction unit (BPU) is always a key area for a CPU architect. Spending time and resource on this section improves the processor's ability to streamline the flow of instructions going into the fetch, decode and dispatch areas, charactertised by moving down the right-hand side of the below graphic.
Any stalls in this area inhibit a processor's ability to run efficiently - the pipeline needs to be flushed out - and AMD has taken a number of steps to improve it from the previous generation. The BPU needs to guess which way a branch is likely to go so AMD has implemented a self-training perceptron algorithm that assigns a set of weights based on branch outcome. In short, the chip pre-loads instructions based on recent behaviour and optimises their flow through the processor. This is effectively more accurate speculation made possible through machine learning. Zen also features improved prediction by increasing the capability and size of the branch target buffer, which can be considered a pool of cache.
Going hand in hand is what AMD calls Smart Prefetch, where Zen has made key improvements in how it identifies data streams, strides and regionality of data. What this means in practise is that required data can be loaded into the chip's large caches more quickly than before.
Per-core improvements and SMT
The holy grail of modern CPU architecture is to fundamentally increase the IPC (instructions per clock) whilst, in the interests of employing it in a diverse range of environments, ensure that it is also energy efficient. These goals appear to be mutually exclusive on first glance - more throughput and performance without shunting lots more power through it - yet AMD has found a way through modern improvements to the core, caches and power. Let's focus on the performance side first.
Zen can fetch and decode four instructions per clock cycle. Intelligent design dictates that the same work is not done again - for example, decoding and dispatching the same instructions over and over again, as was the case with Bulldozer. This is where the new op-cache comes in. It is able to store eight instructions, in flight, and push out six (Bulldozer had four) to the micro-op queue and then out to the dispatch unit. The relatively simple addition of a buffer, as Intel has implemented in recent designs, enables Zen to keep a higher throughput efficiency.
AMD is clearly taking a different approach to previous generations. Mike Clark, principal engineer for the Zen core, said that a primary goal was to have a balanced architecture, removing the bottlenecks that appeared you know where. AMD identified that wider is better for Zen in this context, thus increasing the register files, scheduler and reorganising the balance of arithmetic logic units (ALUs) and address generation units (AGUs) from two each to a 4:2 combination. It is thought that limited ALU throughput contributed heavily to Bulldozer's inability to squeeze more out of the design.