Developing a heterogeneous multicore SoC for use in a mobile environment

Peter McGuinness, Imagination Technologies
embedded.com (April 26, 2014)

Mobile SoCs have been multicore for some time now, both in the homogeneous sense of an array of identical (or at least similar) CPU cores and also in the heterogeneous sense of DSPs, GPUs, and other programmable and configurable processing cores on the die. With this variety of parallel processing opportunities available, what kinds of applications and use cases drive the increasing adoption of heterogeneous multicore implementations and what are the benefits available to users?

There are a couple of broadly dissimilar classes of applications that can be crudely classified as high performance computing (HPC) and consumer. The HPC apps may feature long simulations of very large data sets with extreme precision and accuracy requirements, while consumer apps may feature much less stringent accuracy requirements but operate in real time or near real time while still being able to handle relatively large data sets.

The mobile context is dominated by video-rate apps requiring manipulation or analysis of visual data at a low level with a relatively small amount of higher-level code. These apps are inherently heterogeneous in the sense that they contain layers of functions which can be divided between the CPU array and the GPU (which is classed as a single core but in fact consists of a large array in itself) and can therefore achieve best efficiency, meaning higher frame rate or lower power or more responsiveness - or all three, by being distributed across the available resources.

One of the consequences of the emergence of this class of applications is that the purpose and nature of the camera pipeline (ISP) is changing from being primarily aimed at image production to being redefined as a vision processor, usually as part of a heterogeneous trio in cooperation with the CPU and GPU. Application examples include video conferencing with face beautification, where the majority of the workload can either be handled by the GPU or shared between the GPU and the ISP. Video encoding can be a CPU task or can be offloaded onto dedicated encoder hardware that we call a Video Processing Unit (VPU). In this scenario, the objectives are to maintain consistent frame rates while simultaneously keeping to a power budget appropriate for extended use on a mobile device.

A retail analytics application from Vadaro, while using broadly similar low-level tasks, shows another requirement, which is to run multiple kernels (for multiple customer detection) simultaneously on the GPU, while in another app, Find Exact/Find Similar, the CPU is left free for database searching and results manipulation by delegating the vision-specific tasks to the GPU.

These three outcomes - higher frame rate, lower power, and free CPU cycles - are the primary benefits sought by mobile developers and are available through heterogeneous multicore. But how can this be quantified and what can be achieved by all apps?

Click here to read more ...