Designing for the Future: The I6400 MIPS CPU Core

By Tirias Research (Sponsored Whitepaper)

A Changing World

Streaming media, cloud services, wearables, Internet of things (IoT), Software Defined Networks (SDN), Network Function Virtualization (NFV), and Big Data are all relatively new terms in the high-tech industry. However, these terms represent changes in the way data is collected, transmitted, and processed. In addition, the cumulative effect of the impact of the technology behind these terms represents an increasing rate of change and challenges designing solutions to keep pace.

For semiconductor providers, this rate of change means not only designing for the next generation of devices and networks, but also planning ahead for the next two or even three generations. Designs need to anticipate issues, such as increased interoperability and security, broader power and performance, and a shrinking time-to-market and reduction in system cost.

Imagination is addressing this challenge with the new MIPS Warrior family of CPU IP cores, including the latest I-class I6400 family. The Warrior family provides a platform capable of scaling from low-power 32-bit applications like smartwatches through 64-bit High-Performance Computing (HPC) server applications with a common instruction set and common CPU attributes. In addition, Imagination offers other critical intellectual property (IP), such as the PowerVR family of multimedia cores and Ensigma connectivity cores, to provide the critical building blocks for flexible and scalable System-on-Chip (SoC) solutions.

The Problem

The creation of a world of connected electronic systems has been a revolution over 35 years in the making. Key technical advances such as lithography shrinks (Moore’s Law¹), high-speed networking, pervasive wireless communications, and the Internet have enabled intelligent, connected, and mobile devices. It is these solutions, combined with new uses and business models around the collection and use of information that is driving the connectivity of everything from simple sensors to the most advanced supercomputers and from electromechanical systems to humans. As a result, forecasts for the connected devices or nodes reach into the hundreds of billions.

Figure 1. Forecasts for Connected Devices/Source Nodes in 2020

This shift to a connected world has significant implications for all electronic systems. A connected world means more and different types of systems will be connected and information exchanged, yet systems must support the same applications and/or information in a secure manner. In addition, electronic solutions must be more flexible to scale with power, form factor, and performance requirements. This requires support for:

Various MCU/CPU configurations, especially as the ability to monitor, analyze, and act on data increases. Many of today’s embedded applications are moving to 32-bits from 8- and 16-bit microcontrollers. Meanwhile communications and computing applications are continuing to shift to 64-bits.
Heterogeneous computing which can leverage other processing resources, like GPUs, DSPs and programmable logic, to improve the performance and efficiency of the SoC when executing highly parallel workloads.
Code and data portability between different systems. While some of this can be achieved through Instruction Set Architecture (ISA) portability, there are still likely to be different system requirements and configuration at different points throughout the network.
Higher levels of security to protect the data and the entire network from malicious intrusions.

Building the Right Solution

Using a single hardware or software platform for every application is neither practical nor feasible. Beyond the plethora of new devices and applications, the IoT era represents a build out of networks to reach anything that can provide data, which could be as simple as a sensor or as complex as a factory or car. As a result, there are size and power constraints on many of the end points (also referred to as source nodes), as well as other points throughout the network. In addition, not every point in the network requires the same level of performance. There will be nodes throughout the network that require more performance to be able to collect, analyze, and act on data. These points may be local, like the control system for a home or a car, or part of the broader internet through cloud services. Future semiconductor solutions must be targeted towards specific applications or classes of applications to meet the needs of tomorrow’s electronic platforms.

Figure 2. The Network Expansion from Sensors to the Cloud

Even within a specific class of products, the performance requirements may vary. For example, a smart router or gateway may be performing a similar function whether it is located within an industrial environment, a vehicle, or a home. However, the performance and functionality varies greatly depending on setting. The intelligent router in the home of the future will be responsible for the communications with various computing and communications platforms, as well as communications to other household appliances and service providers, such as the white goods (refrigerator, washer, dryer, and air conditioning), security systems, and home automation.

Much of the data, particularly for consumer devices, will be rich multimedia content that requires a high quality of service (QoS) for low latency and smooth operation. In an industrial environment, such as a manufacturing facility, a smart router may be required to monitor, analyze, and act on information from thousands of equipment sensors in real-time. Such an environment places a high priority on high-throughput and parallel processing to monitor, analyze, and act on local data. In a vehicle, a smart router will be required to handle rich multimedia content for entertainment, navigation, and communications similar to the home, while monitoring and reacting to hundreds of vehicle sensors in real-time, not unlike a manufacturing environment. While all three smart routers are performing similar tasks, the performance requirements will require different memory, CPU and I/O configurations as well as the potential for heterogeneous processing.

Table 1. Varying Performance Requirements for Smart Gateways

As a result of semiconductor manufacturing and integration trends, most electronic solutions today are powered by a System-on-Chip (SoC) – a semiconductor chip that combines multiple functions on a single chip. However, with continued advances in software, connectivity, and new features and functions, SoCs can often become outdated very rapidly. As a result, SoCs must be designed to support the latest technology and flexible enough to support further advancements applications may require. The four most pressing issues are:

Power Efficiency
Performance Scalability
Security
Hardware and Software Reuse

Power Efficiency

From battery powered devices to the largest server farms, power has become the most limiting factor in scaling electronic platforms. In addition, power consumption touches every aspect of the use of electronics from the user experience on mobile devices to the scaling, operation cost and financial return on communications networks and server farms. As a result, semiconductor manufacturing has shifted to focus on low-power processes, but manufacturing is only half the battle. The other half in is semiconductor design.

Designing an SoC requires many decisions, such as how to perform each function. Functions can be allocated to dedicated logic or executed in general logic like the CPU through software. Because of the ever changing nature of software, applications, and standards, the CPU is still the heavy lifting engine within most computing solutions today. As a result, the efficiency of the CPU cores is often a key factor in the overall efficiency of the SoC.

It may be counterintuitive, but SoC power can be reduced by increasing performance. Faster processing reduces the time required to execute instructions, thus enabling the entire device to enter lower-power or sleep states more frequently and for longer periods. In addition, power can be reduced by finely controlling the power to each functional block within the SoC. Fine-grained power islands and clock gating allows only the necessary parts of an SoC to be powered up to execute selected tasks.

Performance Scalability

Overall performance of a processor was once the driving factor of the semiconductor and electronics industry, especially for computing devices. But the market has shifted to connected mobile platforms, and this move has enabled completely new device usage models and intelligent solutions. As a result, performance has become a lower priority. However, meeting the needs of an ever increasing range of devices requires performance flexibility and there are several ways to enable scalability.

The first and most common manner to provide scalability is multi-CPU configurations. Using multiple CPU cores can enable everything from multi-tasking on a smartphone to parallel processing through a network. While market differentiation has pushed SoC core counts for some consumer devices beyond the point of practical usefulness, there are benefits that can be achieved through using more cores in server and networking applications.

Performance scalability can also be enabled through the design of the CPU microarchitecture and related functions. Factors such as the number of pipelines, co-processors/accelerators, cache memory sizes, on-chip fabrics, and I/O interfaces all influence SoC performance.

Security

The evolution of the connected world has resulted in an environment where applications and content can originate, be modified, and stored in multiple locations through multiple networks. Similarly even the most basic mobile devices are capable of running multiple applications simultaneously that may share compute (CPU), memory, and I/O resources. As a result, there are potential threats to data at every level ranging from the CPU and SoC to the networks and servers, and these threats are increasing. According to Symantec’s 2014 Internet Security Threat Report², 38% of mobile users experienced cybercrime in 2013 and the number of data breaches increased 62% over 2012.

Addressing the growing security threats requires encrypting data and creating trusted execution environments for each application and in each segment of the network to ensure all applications, data, and I/O are secure. Unfortunately, the lack of consistent security standards and poor hardware and software security support leave most platforms susceptible to threats ranging from malicious attacks to data theft. Future platforms must address the increasing complexity of the operating environments and be able to provide multiple security environments for each application.

Reuse

Designing custom chips for each new application was once the norm for many devices and applications. However, the number of potential devices/applications is increasing exponentially while the design and life cycles continue to shrink with the rapid pace of innovation throughout the electronics value chain. As a result, both hardware and software must now be designed with reuse in mind. Designing hardware and software for reuse allows for the same technology to be used for other applications, as a baseline for future solutions, and for compatibility between future technology generations. Designing for reuse also enables the creation of common libraries, tool sets, and other resources, which reduce the time-to-market for SoCs and the devices utilizing them.

Therefore, OEMs are now looking for ways to leverage development efforts in one segment to others, including seeming unrelated applications, such as consumer wearable electronics and medical monitoring devices; consumer white good appliances, automotive, and industrial automation; and tablets, auto infotainment, and point-of-sale systems. While some system specifications may differ, the power, performance, or majority of functional requirements are often very similar. As a result, the concept of reuse is being adapted not only to similar applications, but across applications and market segments.

The MIPS I6400

With these requirements in mind, Imagination has developed a new generation of MIPS CPU cores to scale across the range of applications from embedded systems and wearables to networking and high performance computing. The I6400 is the third product in the MIPS Warrior core family. It bridges the gap between the low-power 32-bit M-class MCU cores and the high-performance P-class CPU performance cores. Aimed at applications ranging from what TIRIAS Research calls intelligent devices up through networking and datacenter applications, the I6400 is designed to scale across multiple segments supporting both 32-bit and 64-bit applications.

Figure 3. Classes of IoT Devices

Like other Warrior cores, the I6400 is a family of very power efficient and fully synthesizable cores that can be used on a wide range of mature processes, such as 65nm, through advanced process nodes, including the upcoming 16/14nm FinFET processes. In a 28nm process, each individual CPU core can be as tiny as 1 mm2 while hitting 1 GHz under worst case conditions.

The core features a dual-issue, in order, 9-stage pipeline design with up to 64KB of L1 instruction and data cache, and an optional 128-bit single instruction multiple data (SIMD) engine. The SIMD engine supports a variety of integer (8, 16, 32 and 64-bit) and single and double precision floating point (32, 64-bit) data types. The MIPS SIMD Architecture is software programmable architecture that can leverage high-level programming languages, such as C and OpenCL, for fast and simple development of new code, as well as leverage of existing code. The SIMD instruction set increases performance for multimedia applications while minimizing die area and power consumption. Additionally, the MIPS SIMD Architecture is highly extensible, being able to accommodate future requirements. This flexible solution can be used to support high-throughput enterprise data applications like scientific simulation and data mining to demanding mobile and home consumer electronics applications, such as high-quality audio, video, image, and graphics.

Figure 4. Block Diagram of the I6400 MIPS CPU Core

Each I6400-based SoC can be configured with up to two I/O coherency units, and cluster-coherent L2 cache sizes ranging from 512kB to 8GB. In addition, the I6400 CPU cores support simultaneous multi-threading and can be configured with up to four threads per each physical core. The I6400 is also designed to be configured into clusters with up to six cores per cluster and up to 64 total clusters. The result is up to 24 total threads per cluster extending to over 1,500 total threads. This flexible hierarchy provides a more effective scaling strategy than just adding more cores within a cluster, and allows the architecture to scale from low-power embedded applications through high-performance networking in a very area and power friendly footprint.

Figure 5. Block Diagram of a I6400 Multi-core Cluster Configuration

The I6400 also offers fine grain power control referred to as PowerGearing. With PowerGearing, each CPU core within a cluster can operate independently at different voltages and frequencies, including being placed in deep-sleep states. Similarly, segments of the L2 cache and other chip functions can be powered on or off as needed. This fine level of power control minimizes overall power consumption by allowing only the necessary CPU cores and other functions of the SoC to be operating. In addition, it allows for optimal performance when increasing the CPU voltages and frequencies to execute specific tasks.

For security, the I6400 includes Imagination’s extensible security framework, including hardware virtualization. The virtualization allows for the separation of SoC resources through the creation of multiple virtual domains that extend throughout all hardware and software layers. This allows for the separation of resources by application, not just the CPU and operating system. As a result all applications and data can be secured against intrusions, thus allowing autonomous and secure operation of each resident application. In total, the I6400 can support up to 31 secure/non-secure application domains.

Figure 6. Secure Execution Domains through Hardware Virtualization

Utilizing the MIPS64 ISA, a super set of the MIPS32 ISA, the I6400 supports the reuse of 32-bit applications alongside 64-bit applications. The I6400 also supports many of the same features of the existing Warrior M-class and P-class cores, while providing a baseline for all future Warrior MIPS CPU cores.

Warrior cores such as the I6400 and many future MIPS CPU cores will support heterogeneous computing, working within a coherent fabric alongside GPUs, such as PowerVR Rogue cores, and other processing elements. Note that Imagination is a founding member of the Heterogeneous System Architecture (HSA) Foundation. In addition, future Warrior MIPS CPU cores will support the intermixing or difference M-class, I-class, and P-class CPU cores on the same die.

In addition to the features of the Warrior cores, Imagination is one of the founders of the prpl Foundation³, an open source community built around the concepts of application portability from device to data center on MIPS architectures. The prpl Foundation intends to leverage the growing range of MIPS-powered SoCs to ensure that a broad ecosystem of developers can easily make full use of all the latest features in MIPS Warrior cores.

Summary

The transition to a connected world requires greater security, flexibility in power and performance, and greater reuse of hardware and software. The Warrior family of CPU cores meets these needs with integrated security and virtualization, the ability to scale from sensor-based devices through high-performance computing applications, and compatibility with future generations of CPU and GPU cores.

The MIPS I6400 family of synthesized CPU cores provides unmatched flexibility with support for both 32-bit and 64-bit applications and configuration options within the CPU cores, in the number of CPU cores, and the combination of cluster of CPU cores for up to 1500 total threads. This is all accomplished using one of the industry’s most power efficient CPU architectures.

The result is a platform capable of supporting applications ranging from wearable electronics to mobile and personal computing applications, and networking and servers. The I6400 family of MIPS CPU family also offers growth for the future with the promise of heterogeneous computing and the complementary IP and tools to build competitive SoCs.

1 The principle around the manufacturing of semiconductors that the number of transistors per mm2 of die area will double every 18 to 24 months.

2 Internet Security Threat Report, Symantec, April 2014, http://www.symantec.com/security_response/publications/threatreport.jsp

3 Additional information is available at http://www.prplfoundation.org