Raising the bar for mainstream Cortex
This is a guest blog by Govind Wathan, CPU Product Manager at ARM. The views expressed in this blog are his and his alone. We invited Govind to share some of his thoughts on the all-new Cortex-A55 processor.
We live in an increasingly always-connected, intelligence-everywhere digital world. Whether it’s self-driving cars on the road, augmented reality (AR) on our smartphones or artificial intelligence (AI) in our homes – powerful devices are tirelessly consuming data, and power or battery-life. So how do we sustain this level of constant, advanced technology in a power-efficient way?
Introducing the ARM Cortex-A55: efficient performance, scaling from edge to cloud
The ARM Cortex-A55 processor redefines how high performance can be delivered efficiently for scalable solutions, from small IoT edge gateways to large 5G networks. Based on the latest ARMv8.2 architecture, it brings significantly more performance, increased power efficiency, enhanced scalability and advanced machine learning (ML) and safety features for future applications, from the edge to the cloud.
By comparison to its predecessor, the Cortex-A53 CPU, it has pushed the boundaries even further, boasting gains such as:
- Up to 2x more memory performance than Cortex-A53 at same frequency and process
- Up to 15% better power efficiency than Cortex-A53 at same frequency and process
- More than 10x more scalability than Cortex-A53
The Cortex-A55 processor raises the bar
Arrival at the optimized balance of performance and efficiency gains required challenging existing concepts around the design of the Cortex-A53. How did we do it? Here are the highlights:
- Reduced idle time between instructions: We overhauled the branch predictor by incorporating neural network elements in its algorithm to improve prediction. We also added zero-cycle branch predictors to further reduce bubbles in the pipeline.
- Lower memory access latency: We made the L2 cache private to each CPU, which resulted in a reduction of memory access time to the L2 cache by more than 50% when compared to the Cortex-A53. This improved performance across the board.
- Enhanced memory capacity for improved performance and efficiency: We introduced an L3 cache, which is shared across all the Cortex-A55 CPUs within the cluster. The L3 cache is a part of a new functional unit in ARM DynamIQ processors called the DynamIQ Shared Unit (DSU).
Improved sustained performance
The Cortex-A55 delivers sustained performance for a significantly longer duration compared to today’s Cortex-A53 solutions. This is critical for user experiences in markets such as AR, Virtual Reality (VR) and Mixed Reality (MR) that are expected to dominate the future mobile landscape and have high performance requirements. Thermal limits in mobile devices pose as a constraint to performance and mean that they are not able to sustain the required levels of performance for very long. The Cortex-A55 delivers 2.5x more power efficiency compared to Cortex-A53 based devices in the market today. With the Cortex-A55, we have the solution for sustained performance over longer periods in future mobile devices.
The Cortex-A55’s market-leading efficiency makes it a competitive solution for infrastructure processing. Applications such as Power over Ethernet (PoE) wireless access points and thermally constrained rear-view, mirror-mounted automotive solutions can take advantage of the thermally efficient Cortex-A55 and deliver the highest amount of performance in a given thermal budget.
Advanced features and higher performance for infrastructure markets
Scalable from the edge to the cloud and everything in between
In addition to high performance and high efficiency, the Cortex-A55 has also been designed to be highly scalable in physical die area and compute performance. To that end, multiple RTL configuration options were included to give it the ability to be 10x more configurable than Cortex-A53. In fact, it has over 3000 unique configurations, making it the most scalable Cortex-A CPU ever designed.
High-level view of the new features in the DynamIQ Shared Unit
The Accelerator Coherency Port (ACP) and a low-latency peripheral port (PP) are integrated into the DSU to enable closely-coupled accelerators to connect to the Cortex-A55 for general compute. These features, alongside the ML capabilities of the Cortex-A55, enable more compute to happen closer to the ‘edge’ in IoT gateway applications, for increased performance and security.