System-level tools slash SoC dynamic power

System-level tools slash SoC dynamic power
By John McNally, EE Times
January 22, 2004 (4:35 p.m. EST)
URL: http://www.eetimes.com/story/OEG20040122S0025

System-on-chip designs integrate functions as diverse as wireless communications, MP3, digital imaging and video playback. SoC devices are essential technology enablers in multifunction consumer devices such as video-enabled mobile phones-but only if they meet the increasingly exacting power-consumption constraints imposed by battery lifetime.

Systems-on-chip deploy complex multiprocessor architectures, with embedded software implementing more than 50 percent of the functionality. Consequently, memory and processors account for the bulk of SoC power consumption. In fact, the top three determinants of power consumption are, in order: memory architecture, application-software algorithm efficiency and processor architecture.

Traditional register-transfer-level (RTL) optimization generally delivers up to only a 1.2x power reduction because its low level of abstraction obscures system-wide behavior. The profusion of unnecessary detail, s uch as nanosecond-accurate intrablock transitions and bit-accurate bus behavior, makes system-level analysis both laborious and time-consuming. And RTL optimization doesn't apply to memories and software algorithms, anyway.

In this article, we describe how designers use electronic system-level (ESL) design environments to achieve 10x to 20x power reductions by modifying memory and processor architectures as well as software algorithms.

We start with memory architecture because memory and related bus activity can contribute up to 80 percent-and sometimes more-of an SoC's power consumption. Optimizing memory for power consists of minimizing the number of memory accesses and tailoring the memory architecture to the given application.

Direct correlation of the cache hit-miss ra tio as a function of cache size is performed with a software Gantt chart to determine optimum memory sizes.
Source: CoWare Inc.

Memory-access reduction techniques rely on the fact that addresses do not possess an equal probability of activation. Clustering the high-probability accesses into a separate optimized cache memory greatly simplifies memory transactions, which reduces power consumption and increases performance-two attributes that traditionally are inversely proportional.

The key factors that determine cache memory power and performance are cache algorithms, cache memory size and cache memory architecture.

So, how do SoC designers use ESL environments to analyze cache memory attributes to identify candidates for optimization? They use an SoC architectural design and verification environment, such as CoWare's ConvergenSC, that affords a direct and immediate view of memory accesses and the activity associated with them, within an overall syste m performance model.

Such a tool permits direct correlation of the cache hit-miss ratio as a function of cache size with the software Gantt chart to determine optimum memory sizes (see Fig. 1). Algorithm optimization candidates are identified from the correlation of function calls with the frequency of memory access. Such optimizations may mandate the use of a multicache architecture with L1 cache, which is connected directly to the processor, and L2 cache, which is connected to a high-speed memory bus.

Such a high-level environment also offers the fast hardware/software co-verification that is essential to the test of substantial amounts of real application software with the hardware architecture. "Fast" means one-hundredth to one-thousandth of real-time chip speed. This is at least 1,000x faster than RTL/C co-verification, the speed of which limits its use to only small pieces of code that are generally too small for system-level optimization.

We now come to the optimization of appl ication-software algorithms. Because these are application-specific, they are best demonstrated by reference to a specific application. We have selected a digital-imaging example from a high-end digital camera.

PowerEscape enables analysis of memory accesses to calculate the power associated with any given operation or task.
Source: CoWare Inc.

A common operation is to rotate a pixel matrix by 180 degrees . A common algorithm to accomplish this is to rotate the matrix through 90 degrees , store it in a different part of the memory, rotate it through 90 degrees again and then store it in the original memory space.

Assuming that the matrix is a digital-camera image of 2,705 by 4,065 pixels and that each pixel is 32 bits wide, then a memory o f 22 million words x 32 bits is required. Power efficiency can be improved by a simple memory re-architecture: By storing the rotations in separate memories, each half the size of the original memory, the total power consumption is reduced by 50 percent because smaller memories have lower capacitance and, therefore, lower power consumption.

The algorithm remains the same, however, and therefore is a candidate for optimization. The 180 degrees rotation can be affected by simply transposing matrix elements, thereby eliminating the first 90 degrees rotation.

The consequent elimination of the second memory nearly halves the power consumption again, resulting in a total power reduction of 73 percent.

The challenges of algorithmic optimization are to:

identify optimization candidates, many of which may lie deep within the embedded software and therefore may not be easily visible (an MPEG-4 suite contains about 20,000 lines of C code-where do you start?);
estimate their im pact on cache performance and power; and
minimize memory accesses within the software algorithm.

These challenges are overcome with a specialized ESL environment, such as PowerEscape, that enables analysis of memory accesses to calculate the power associated with any given operation or task. This analysis can be repeated for a single memory or multiple memories-with their sizes and organizations-to determine the optimum memory architecture and to identify access minimization candidates in the software. More-over, such an environment uses realistic memory models to characterize the impact of any given cache on both performance and power.

Now we address the issue of processor power optimization. There is a distinct trend to the deployment of on-chip application-specific processors, especially in battery-powered applications, each of which is optimized for minimum energy per application-specific task. This dramatically increases the demand for differentiated, mostly custom, processor desi gns, a demand that can be met only with an order-of-magnitude increase in design productivity. Such high productivity design is achieved with an ESL environment, such as CoWare's LISATek, that automates both processor design and processor-specific software development tool creation.

Minimization of energy per task using LISATek is an iterative process of automated architecture design, activity analysis and redesign. Energy per task may be minimized by optimization of the instruction set and the pipelining structure, among others.

Analysis of instruction-set utilization, using the environment's profiler, enables the designer to identify frequently used instructions for combination into more complex instructions that execute in fewer cycles. The consequent reduction in the number of cycles required to execute any given task could permit a lower operating frequency and, therefore, lower power consumption. The profiler is also used to identify pipeline candidates that achieve the requisite perfor mance within the power consumption constraints.

The candidate optimizations may be quickly implemented in the automatic processor design environment and the design and optimization process iterated until the power consumption objectives are met. The optimized processor may then, if desired, be modeled in an SoC architecture and verification environment for final system verification and optimization.

In summary, ESL design environments enable designers to slash SoC power drain by 10x to 20x through the analysis and optimization of the system-level hardware and software attributes that consume more than 80 percent of SoC power. Power optimization at this level of abstraction is not only the most effective methodology; it is the only one that does so quickly enough to meet time-to-market goals.

John McNally is a technical director at CoWare Inc. (San Jose, Calif.).

See related chart
A profiler is used to identify pipeline candidates that achieve the requisite performance within the power consumption constraints.
Source: CoWare Inc.