Aeonic Generate Digital PLL for multi-instance, core logic clocking
SoCs optimized for power yield better performance
SoCs optimized for power yield better performance
By Ron Wilson, EE Times
October 2, 2002 (4:21 p.m. EST)
URL: http://www.eetimes.com/story/OEG20020927S0029
ROCHESTER, N.Y. Academic researchers suggested novel strategies for system-level power reduction in presentations at the 15th IEEE International ASIC/SoC Conference here this week, which showed that optimizing a system-on-chip (SoC) design for energy is starting to resemble optimizing for performance. The speakers' approaches differed based on their identification of where to start looking for efficiency.
A paper by Y. Cao and H. Yasuura, both from Kyushu University (Fukuoka, Japan), suggested starting with the dynamic and static power consumed in data paths. The approach was refreshingly simple: rather than accepting standard-width data paths based on history, an embedded-system designer could profile a C model of the application and determine the actual range of values associated with each variable in a design. Then both data path and memory widths could be scaled to fit the actual range of values.
By reducing the number of d evices involved, the authors said, both static and dynamic power consumption would be reduced. There would even be cases where a drastic reduction in a data path that forced an increase in the number of cycles would result in an overall reduction in energy use.
Demonstrating the technique with a proprietary width-configurable CPU and a range of MPEG-related benchmarks, the authors predicted dynamic power savings of between 14 and 59 percent, and static power savings of between 22 and 66 percent simply by using the width-reduction technique. The results were highly application-dependent, but could create new interest in commercially available configurable-CPU intellectual property.
In the two following papers, presenter Ishmail Kadayif of Pennsylvania State University (University Park, Pa.) illustrated additional relatively simple system-level techniques, this time focusing on the frequency of particular operations as well as the overall device count in processor-based systems.
Clip the tag
In the first paper, co-authored by M. Kandemir and I. Kolcu (the latter of the U.K.'s University of Manchester Institute of Science and Technology), Kadayif zeroed in on data cache operations. He pointed out that in many embedded SoCs, data cache consumes a fair portion of the total energy budget. And a good part of that consumption, he noted, came not from the data memory itself, but from the tag comparison operation.
So the authors suggested a slight modification to both CPU and cache, so that the CPU had two classes of load and store instructions: ones that did and ones that did not perform the tag comparison.
This change was complemented by creation of a compiler that identified from the structure of the application code when a load or store was provably certain to result in a cache hit. Then the compiler simply used the non-tag version of the load or store instruction in those cases. An added step, remapping arrays to maximize the number of such instructions, als o proved valuable.
The authors' analysis demonstrated that in typical array-oriented benchmarks, about a third of data cache accesses fit into the category of certain hits. Depending on the application, the authors estimated, using their technique would result in between 11 and 24 percent energy savings. Performing the array reorganization would add another nearly 9 percent savings.
The second paper presented by Kadayif, this one co-authored by Kandemir, examined instruction compression as a means of energy savings. If the instructions could be effectively compressed, the authors argued, the instruction memory could be made smaller and, potentially more important, the number of bit transitions in the instruction pipeline could be reduced.
The compression technique applied by the team was rather unusual. The object file was broken into basic blocks that is, blocks of code between context changes and the blocks scanned for recurring instructions. Surprisingly, in benchmark s about 22 percent of the code was made up of repeated instructions.
These instructions were then checked to see if they occurred in groups of three, which was determined to be the minimum run that could be effectively compressed. The 256 most eligible candidates were replaced by one-byte codes, selected in such a way as to reduce the number of bit transitions on the bus lines when the codes were issued from instruction memory. Occurrence of the codes in the instruction stream was marked with special marker bytes.
When a fetch operation encounters a marker, it sends the following bytes up to the next marker not to the CPU but to a decode table that recreates the original instructions from the 8-bit codes. The reconstituted instructions are then passed to the CPU.
Before and after estimates
The approach was not compared against commercially available code-reduction instruction sets. But it was given before-and-after estimates, which showed that there wa s a potential energy savings in the instruction memory and bus of over 40 percent.
As such techniques continue to emerge, optimizing for energy and thereby performance will rely not a single analytical technique, but on a whole portfolio of relatively simple architectural, compiler and post-processor techniques, each of which has its contribution. It is interesting to note that at the system level, such techniques as these, coupled with software reorganization to minimize memory references can make relatively huge contributions to energy efficiency potentially much more than can be achieved at the device or library level. And the gains are primarily to be had not in architecture or hardware design, but in software.
Related Articles
- Using dynamic run-time scheduling to improve the price-performance-power efficiency of heterogeneous multicore SoCs
- How NoCs ace power management and functional safety in SoCs
- Think Big for Ultra-Low Power IoT SoCs
- Optimizing LPDDR4 Performance and Power with Multi-Channel Architectures
- NoC Interconnect Fabric IP Improves SoC Power, Performance and Area
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- System Verilog Assertions Simplified
- Smart Tracking of SoC Verification Progress Using Synopsys' Hierarchical Verification Plan (HVP)
- Dynamic Memory Allocation and Fragmentation in C and C++
- Synthesis Methodology & Netlist Qualification
E-mail This Article | Printer-Friendly Page |