The fixed processor is dead, long live the battery
The fixed processor is dead, long live the battery
By Peter Wells, EE Times
October 28, 2004 (6:44 PM EDT)
URL: http://www.eet.com/article/showArticle.jhtml?articleId=51201334
Today's portable electronics are experiencing an ever-increasing spiral of feature enhancement and device convergence. Your digital camera, which once simply took photos, now has four times the resolution. It also takes videos and plays MP3 files. Your cell phone now has Bluetooth and a color screen, takes pictures and plays those same MP3 files. The trick to all this feature enhancement is that the consumer also expects battery life to be extended on these new devices. One way in which designers are providing additional functionality without compromising their power budget is by turning from the traditional fixed processors to configurable, extendable processors. Fixed processors require designers to adapt their system to the needs of the processor. The result is often a less-than-optimal system. Configurable, extendable processors, such at those from ARC International, allow the designer to adapt the processor to exactly meet the needs of the application. The result is a system that can provide additional features as well as reduced power consumption. This article will examine the degrees of freedom configurable processors give the designer to reduce system power consumption. These options generally fall into three categories: configurability, or the ability to choose from a menu of standard processor features; extendability, or the ability to add unique, design-specific resources to the processor; and application-tuned memories. Configurability
Configurability is the most obvious way in which power can be reduced over traditional processor design. Fixed-configuration processors must include a feature set that will meet the needs of 90 percent of their potential customer base. This means that for all but the most taxing applications, the processor will include resources that are not required.
In contrast, configurable processors start with the smallest implementation that can provide general-purpose computing. Then the designer can select additional functionality from a menu to add only those things required by that system. Cache sizes, complex instructions, DSP resources, memory-management units, timers, interrupt features and so on can all be added if needed and left off if not needed. This additional control will always result in a smaller and, thus, lower-power, lower-cost design.
An example of the considerable size and power savings afforded by this configurability can be shown by comparing several popular processor architectures all running the same standard audio codec, such as AAC or MP3.
All three processors provide the exact same audio processing for any of the MPEG audio standards; however, the ARC 600 processor consumes a fraction of the power and area of the other two because it has been configured optimally for its audio-processing application. A fringe benefit is that the ARC 600 processor also contains audio extensions that allow it to perform this standard audio processing at lower frequencies than the fixed-processor cores.
Extendability
The audio extensions mentioned above are a perfect example of a less obvious but perhaps more powerful tool to provide power-optimized systems: extendability. Extendability allows the designer to analyze the software required by the application and provide hardware-accelerated instructions/features for those areas where software is spending most of its time. The result is often a reduced required frequency for the whole system and a direct drop in power consumption.
For example, a networking application may examine the header of an incoming packet 500,000 times a second to ensure data integrity, and classify the type of packet. The processor would be required to compute a cyclic redundancy check, extract several bit fields and compare the extracted data with an array of options. This processing can easily take up to 60 typical RISC instructions per header examined.
With an extendable processor, a single custom instruction can do all of these checks in parallel in a single clock cycle. Therefore, what would take a typical RISC engine approximately 30 million cycles (500k headers x 60 cycles per header) would only require 0.5 million cycles on an extended processor. The result for a power-sensitive application is that the configurable-processor system will get identical performance at 29.5 MHz lower clock frequency. That frequency reduction translates into power savings for the whole system's logic.
Memory optimizationAccesses to memory clearly dominate the power budgets of most ICs, especially when the memory resource lies off-chip. Because of this dominance, the memory optimization possible with configurable processors deserves special attention in low-power designs. There are several ways in which the configurable, extendable processor reduces the high-power on-chip and off-chip memory accesses.
The most straightforward example is reducing the size of the processor's memories. Each cache within the system can be tuned to exactly the size needed by the application software. The AAC audio codec running on the ARC 600 provides a good case study. The size of the instruction cache can be selected by the customer to be between 2k words and 32k words.
An audio processor comparison shows the size of an ARC 600, ARM946, and MIPS 4k processor in square millimeters for a 0.18-micron process inclusive of required memories. Source: ARC International |
Software profiling tools, also provided by ARC, allow the customer to very quickly evaluate the cache-hit ratio provided by varying the size of the cache. The Metaware profiler from ARC shows the cache-hit ratio while running the AAC decoder on the ARC 600 processor to be approximately 99.9 percent with a 32-kbyte I-cache. Running the same decoder on an identical ARC 600 processor except with only a 2-kbyte I-cache yields a cache-hit ratio of approximately 99.1 percent. The smaller the cache, the lower the power required per cache access. However, the smaller cache will also result in more high-power accesses to external memory. This reduced power for 99.1 percent of the accesses can then be balanced against the 0.8 percent increase in external accesses to provide a power-optimized solution for this application.
While custom instructions are primarily thought of as a way to increase performance, they can also be key in reducing memory accesses. A packet-checking example can illustrate how the inclusion of custom instructions can reduce traffic to instruction memory. A typical RISC processor will have to retrieve approximately 60 instructions from I-cache, each time requiring both tag RAM and instruction RAM active access power. The custom instruction, which replaces those 60 instructions, will require only one RAM access. Depending on code locality and cache size, these accesses could also require accesses to off-chip RAM, not just I-cache, which further increases the power penalty for using a fixed-processor core.
The first two examples have focused on instruction memory power savings, but configurable processors also offer the same power advantages on the data-movement side. Data caches can vary in size at the push of a button just like the I-caches, so they enjoy the same benefits of optimizing the size of the cache after profiling for the application.
Peter Wells (peter.wells@arc.com) is North American solution architect manager for ARC International (San Jose, Calif.).
See related chart Figure shows an example extension instruction for an ARC 600 processor and the measured instruction count comparison, with and without custom extensions and interfaces. Source: ARC International |
Related Articles
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
E-mail This Article | Printer-Friendly Page |