Multi-core: The Move from Proprietary Solutions to Open Standards

by Markus Levy, Founder and President, EEMBC

Abstract:

To help the embedded industry accelerate the adoption of multi-core devices, several key ingredients must be developed, including open standards to enable vendors’ products to work together, and another is the availability of standard benchmarks.

Embedded processor vendors and system developers have been developing or working with multicore devices for many years. Nothing new here. These devices have been available in a variety of flavors including, at the highest level, symmetric multiprocessors and asymmetric multiprocessors. For example, the Apple iPod features dual 90MHz ARM7TDMI processors. And how many mobile phones contain an ARM processor combined with some sort of DSP? Furthermore, ARM has licensed its MPCore technology to a variety of vendors including NEC and Nvidia. The ARM11 MPCore synthesizable processor implements the ARM11 processor and can be configured to contain between one and four processors.

To help the embedded industry accelerate the adoption of multicore devices and speed time to market, several key ingredients must be developed. One of those ingredients is open standards that will enable vendors’ products (including processors, RTOSs, and development tools) to work together. Also important will be the availability of standard benchmarks that will validate the performance differences between complex multi-core devices.

The Current Situation

So, what is new here? It seems as if the introduction of mainstream dual-core processors by AMD and Intel have sanctioned the worldwide trend towards multicore platforms, even in the embedded market. The MHz race has ended, and instead the move is towards building smarter and more power and run-time efficient platforms. Or at least companies are now more readily admitting it.

Previously, almost all embedded software could be written with the assumption that a single processor was the execution vehicle; and where multiple processors were involved, they were either relatively loosely-coupled and could be considered separately, or were collaborating in easily parallelized computations. While dual-core machines will change this model somewhat, we can expect the number of cores to grow exponentially, roughly doubling with each processor generation. Furthermore, chips of the future can be expected to exhibit increasingly higher degrees of heterogeneity in terms of cores, interconnect, hardware acceleration, and memory hierarchies. The industry’s challenge will be figuring out how to efficiently harness this processing capability.

Currently in the embedded industry, most, if not all, of the multi-core hardware and software implementations are based on proprietary solutions. The necessity to move beyond parallel computing and SMP paradigms and towards heterogeneous embedded distributed systems will likely drive changes in how embedded software will be created. Thus, it will drive changes into development tools, run-time software, and languages. Programming such systems effectively will require new approaches. Given that software is a large investment for many companies, it is normal to desire software portability across a range of multi-core systems. A number of barriers must be addressed to enable a better software paradigm.

Industry Partnership

To cope with this impending change, it will be helpful for the industry to agree on common, simple, and efficient abstractions for such concurrent systems to allow us to describe key aspects of concurrency in ways which can be simply and directly represented as a set of APIs. In other words, the multi-core ecosystem (comprised of chip vendors, semiconductor IP providers, RTOS, compiler, and development tool vendors, as well as application developers) must agree on the appropriate interfaces to support interoperability and therefore, quicker time-to-market.

Dealing with Mixed Operating Systems

Specific areas of programming multicore systems that must be addressed are task and resource management and communication and synchronization that are required for embedded distributed systems. This need stems from the reality that such systems cannot rely on a single operating system -- or even an SMP operating system -- for such services. It can be expected that such heterogeneous multicore systems will employ a range of operating systems across multiple cores, and therefore will have resources that cannot be managed by any single operating system. This situation is exacerbated further by the presence of hardware accelerators which do not run any form of operating system, but which must interact with processes that are potentially running on multiple operating systems on different cores.

A new industry organization, the Multicore Association, has been formed to serve as an umbrella organization for multicore related discussions, standards, and support for participants. To help overcome the challenges described in the preceding text, the Multicore Association is working on three separate, but somewhat related, interface standards: the Resource Management API (RAPI), the Communication API (CAPI), and the Transparent Inter Process Communication (TIPC) protocol, specially designed for intra-cluster communication.

The Rap on RAPI

The primary goal of the RAPI is to provide a standardized API for the management, scheduling, and synchronization of processing resources. The Multicore Association generically refers to these processing resources as ‘work entities’ because these resources can include many different types of functions (i.e. processors, hardware accelerators, DMA engines) and memory resources. In a sense, the RAPI is similar to pre-existing standards, notably POSIX pThreads. However pThreads differs in key areas, most notably in support for heterogeneous multi-core and memory architectures (Table 1).

PThreads	RAPI
Single core or SMP	Heterogeneous multi-core
Implicit task scheduling hierarchy	Explicit task scheduling hierarchy
Uniform memory architectures only	Non-uniform memory architectures allowed
Numerous synchronization methodologies (mutex, semaphore, signal)	Binary semaphores only
~100 API calls	~10 API calls
No memory allocation calls	Memory allocation calls for NUMA.

Table 1. High-level comparison of the Multicore Association RAPI and POSIX pThreads. To engender rapid understanding and adoption, RAPI will use a highly simplified subset of calls found in real and defacto standards, such as pThreads, with extensions where necessary to support heterogeneous multi-core architectures.

The RAPI embodiment should support features for state management, scheduling (including pre-emption where allowed by tasks and processing resources types), context management (stack creation/allocation, destruction/deallocation, save and restore), and basic synchronization. A further challenge for RAPI is that it should be complimentary to the CAPI and existing operating systems (either as a virtualization layer or as a part of the kernel).

Messaging and Synchronization in Concurrent Embedded Systems

The CAPI specifies an API, not an implementation, for the purposes of messaging and synchronization in concurrent embedded software systems. As such, the CAPI must support many of the wellknown qualities described for distributed systems. However, due to certain assumptions about embedded systems, the CAPI is only required to support a subset of the distributed systems qualities defined by noted authors such as Tannenbaum, and also by standards such as CORBA. This subset of qualities is necessary because of the specific needs of embedded systems such as tighter memory constraints, tighter task execution time constraints, and high system throughput.

Figure 1: This diagram provides an example of the logical view of CAPI and RAPI.

The target systems for CAPI will span multiple dimensions of heterogeneity (e.g., core heterogeneity, interconnect heterogeneity, memory heterogeneity, operating system heterogeneity, software tool chain heterogeneity, and programming language heterogeneity). While many industry standards already exist for distributed systems programming, they have primarily been focused on the needs of (1) distributed systems in the large, (2) SMP systems, or (3) specific application domains (for example scientific computing.) Thus, the CAPI has similar but more highly-con- strained goals than these existing standards with respect to scalability and fault tolerance, yet has more generality with respect to application domains.

Besides the interface challenges described in this article, this organization is also working on improving hardware debug for multicore platforms. Specifically, the Debug Working Group is pursuing initiatives that will identify and tie high-level requirements for multi-core debugging to specific requirements on underlying infrastructures, such as protocols, runtime layers, and operating systems. The group will seek to identify appropriate interfaces and drive standardization to enable a richer and more uniform debugging experience on multi-core systems. The group will also extend existing debug interfaces in a standardized way to meet the needs of multicore debugging, as well as standardize the connection between JTAG interfaces and debuggers, enabling third-party debuggers to control systems with multiple cores with different JTAG interfaces from multiple vendors.

But this is just the beginning. Multi-core designers will be faced with challenges of code partitioning and system-level benchmarks that go beyond the standard SMP benchmarks that are available today. Along these lines, EEMBC is working on a new benchmark suite that will allow vendors to demonstrate the performance gains (or not) from employing 2 or more cores in a device or system.