This is a guest blog by Kinjal Dave, CPU Product Manager at ARM. The views expressed in this blog are his and his alone. We invited Kinjal to share some of his thoughts on Cortex-A32, the smallest, lowest-power chip based on the ARMv8-A architecture.
Five years ago, could you have imagined high-end smartphone-class processing capabilities in low-cost embedded designs? Likely not, but designers are quickly ratcheting up performance in their embedded designs, delivering systems and applications that few could have imagined just five years ago.
Three main forces underlie this transformation:
- Richer (and more sophisticated) operating systems have made their way into embedded applications
- Internet of Things (IoT) design — every device is equipped with a radio and an IP address to connect to the Internet
- Embedded applications are expanding in scope and performance, enabled by access to low-cost processing power, a common software architecture and a vibrant ecosystem.
The embedded market, therefore, has broadened into a spectrum of applications from traditional control to rich-embedded applications.
At the same time, silicon providers have helped accelerate this broadening of the market by bringing more processing capability and connectivity into embedded without sacrificing power and size.
ARM this week (23rd Feb, 2016) put another stake in the ground in the rich-embedded segment of the market with the announcement of the Cortex-A32: a new ARMv8-A 32-bit processor. The latest in the Cortex-A family features ARMv8 architectural enhancements, higher efficiency and performance and scalability, at a power and area footprint that fits a diverse set of embedded applications. It’s the latest ultra-high-efficiency Cortex-A processor following the 2015 rollout of the Cortex-A35 and is targeted at embedded and IoT.
So if ARM’s Cortex-M and Cortex-R families have traditionally enabled a multitude of embedded applications, why Cortex-A32? In many ways, it boils down to two key factors: the operating systems used in rich-embedded applications and the performance requirements of these new applications.
Supporting rich operating systems like Linux requires virtual memory and a memory management unit. The majority of embedded products based on Cortex-A processors run full virtual memory-based operating systems like Linux, Android, and Windows 10 IoT Core. There is a wide set of requirements: spanning products that need to be very low power to the other extreme where developers are beginning to push their designs aggressively into performance areas approaching that of smartphones and laptops. Cortex-A has been used extensively in embedded applications where the wide portfolio of processors allows developers to select the most appropriate cost, power and performance point for their application.
Detailing the enhancements
Let’s walk through some of the Cortex-A32’s enhancements mentioned earlier.
Cortex-A32 is the only ARMv8-A processor optimized for 32-bit-only compute. As such, Cortex-A32 offers an ARMv8-A upgrade path for applications that today use ARMv7-A processors, like Cortex-A5 and Cortex-A7 or classic ARM processors like ARM926 and ARM1176.
The ARMv8-A architecture supports both 32-bit and 64-bit compute capabilities in the AArch32 and AArch64 execution states. Cortex-A32 supports AArch32 , which is sufficient for 32-bit rich-embedded applications that need the lowest cost and power. Even in AArch32, ARMv8-A adds more than 100 new instructions – the Cortex-A32 benefits from all of these.
Cortex-A32 is 25 percent more efficient (more performance per mW) than Cortex-A7 in the same process node. Cortex-A32 delivers this efficiency through performance improvements and power reduction, two often-conflicting design goals that the Cortex-A32 team managed to deliver in tandem.
The Cortex-A32 offers performance improvements of up to 25 percent on various benchmarks in comparison to the Cortex-A7 and up to 40 percent compared to the Cortex-A5. On certain workloads like memory streaming and crypto, the Cortex-A32 offers massive improvements: 375 percent and 1,100 per cent, respectively, compared to the Cortex-A7. To put things in perspective, the Cortex-A32 delivers similar performance to Cortex-A9, which was the premium smartphone standard just a few years ago, but with significantly reduced power. That performance is coming to the lowest-cost rich-embedded devices now. Let that sink in for a second.
Cortex-A32 is a highly-scalable and configurable processor. It offers a wide range of configuration options. The diagram below shows two configurations of Cortex-A32 but there is a range of possibilities in between.
The configuration on the left (pictured above) shows a typical performance-optimized multi-core configuration – quad-core, larger cache sizes and includes optional features like NEON and Crypto engines. This configuration provides optimal performance for most rich-embedded applications but retains ARM’s low power leadership – consuming less than 75mW per processor core, when running at 1.0GHz on a 28nm process node. At the other extreme, the smallest configuration of Cortex-A32 processor, with a physical implementation optimized for area, occupies less than quarter of mm² and consumes less than 4mW at 100MHz in the same 28nm process node. With this scalability, the Cortex-A32 is suitable for a wide range of rich-embedded applications.
Sometimes, it’s hard to appreciate the pace of change in our world. It comes faster than we realize. The user experiences that astonished the consumer world just a few years ago are prompting customers to demand similar innovation in embedded. We can ignore that at our own peril, or we can deliver on it today.
For more information, please visit our Cortex-A series landing page.