Efficiency from efficient architecture
Adding in the process and frequency, as one should, leaves Cortex-A73 with a 30 per cent sustained performance lead and a 30 per cent reduction in energy efficiency. What's more, from a size point of view, Cortex-A73 takes up 46 per cent less die area. A single core and its associated L1 caches measures just 0.65mm squared.
So what special sauce have ARM's engineers cooked up to enable the new chip to do more with less, even once the Cortex-A73's debut 10nm process has been stripped out? The answer, unsurprisingly, is well-balanced architecture.
In the ARM world, performance CPUs that take a relatively large die size aren't as energy efficient per square millimetre as those designed for smaller spaces. Opting for performance dictates certain designed choices - pipeline length, instruction buffer, caches, etc. - that makes this assertion fact. This is why ARM is keen to push the virtues of its big.LITTLE topology, where each core - either big or little - does the work it is best and most efficient at.
However, building super-energy-efficient chips - the little ones - enables one CPU design team to take the learnings from another and reapply them elsewhere. Cortex-A73 is a big chip that likes to behave, energy-wise, as a little one.
Gains in efficiency are harnessed across the silicon, but we can make a few call-outs. A very efficient branch predictor is a must. ARM says it has reworked the branch predictor to enable very high flow through the instruction pipeline. Keeping the pipeline appropriately fed - which in this cases is a few stages shorter than Cortex-A72 - gives rise to high efficiency and performance.
Sounds easy? Yet improving an already-decent branch predictor is hard work. ARM installs a larger branch table address cache and a deeper multiple-entry BTAC to guess which way a branch is going to go. And if there is a misprediction, which is inevitable, the Cortex-A73 is able to get back to an efficient state/flushing by wasting less power.
The effective width of the decode engine has a key impact upon both performance and efficiency. Cortex-A73 reduces this to dual-decode, down from three on the Cortex-A72, ostensibly for power-saving reasons, but mitigates performance loss by utilising that improved branching, a better caching system, clever prefetching and higher internal memory bandwidth.
Could Cortex-A73 have been faster if it was, say, 3-decode or even 4-decode on the front end? Yes, clearly, though it certainly wouldn't have been as energy efficient.
We guess the design team's remit was to exceed the incumbent Cortex-A72's iso-comparable performance by a small margin, with the rest of the focus on driving down the power consumption. Energy efficiency actually plays out a couple of ways; firstly, it enables other parts of the SoC to strut their stuff without running into obvious power issues. Secondly, it offers the Cortex-A73 a wider play in the premium market.
Energy-efficient processing offers SoC manufacturers an opportunity of consolidating older cores and gaining more performance without adding to the die area or power consumption.
Here's an example of four mid-range Cortex-A53 cores being replaced by a couple of Cortex-A73, enabling a big.LITTLE combination, most likely on an older manufacturing node. The relatively small size of Cortex-A73 doesn't cause the silicon footprint to rise - a key metric for any SoC-maker catering for the mainstream - yet single- and multi-core performance rises dramatically due to the new chip's performance credentials.
The Cortex-A73 will be seen in a number of premium handsets from the start of 2017. ARM has made very deliberate choices in the engineering behind Cortex-A73, tuning it for excellent energy efficiency rather than all-out performance.
This enhanced efficiency allows the processor, the company says, to run applications at a sustained speed that is very close to peak. If you take away one message from today's launch, it should be that efficiency is the driver of long-term performance.