Designers face a continuing struggle in balancing power and performance in leading-edge system-on-chip (SoC) designs. A higher supply voltage can mean faster devices, but at the cost of greater power consumption — a problem compounded by the high current leakage found with advanced nanometer process nodes at 90nm and below. Until recently, deploying an effective low-power design strategy has remained largely beyond the reach of mainstream designers. Through a broad collaborative effort of intellectual property (IP) vendors, EDA providers and independent foundries, however, a new low-power design methodology augments the familiar RTL to GDSII flow to enable every designer to optimize their SoC design for power and performance. In pursuing broad market interest in sophisticated mobile applications, semiconductor designers have leveraged increasingly advanced CMOS technologies to deliver integrated circuits (ICs) that set new milestones for size, performance and complexity with each product generation. Yet, even as new process technologies have enabled transistor density to double every 18 months, battery technology has lagged significantly, taking more than five years to achieve the equivalent doubling of capability. At the same time, power consumption for advanced devices continues to jump significantly with each process generation due to increased leakage current associated with nanometer technologies. Already, power requirements for advanced microprocessors often exceed 100W, and threaten to grow even higher as designers migrate to more advanced technology nodes and higher clock rates. Consequently, increased power consumption has become a major issue at the system level in both wired and wireless systems as system manufacturers face new demands for dissipating more heat from smaller packages. Once a concern primarily for portable consumer products, the need to maximize performance while minimizing power consumption has now become a critical issue in broader market segments including wired embedded products and high-end computing platforms. In the past, low-power design experts have been able to employ specialized architectural approaches or specific circuit design methods including clock gating, frequency scaling and special process options. Yet, even these methods have largely remained the exclusive domain of the largest semiconductor companies and typically applied only to the highest-volume devices. Effective low-power design requires a compatible set of specialized capabilities that cross the entire design chain — encompassing intellectual property (IP) models, libraries, design tools and manufacturing capabilities. In turn, an effective solution to mainstream low-power design requires the combined efforts of IP vendors, library providers, EDA tool developers and foundries. Accordingly, members of the Silicon Design Chain Initiative recently collaborated to create a cross-industry solution for power management and validate this solution on a test design. This effort culminated in successful implementation and silicon validation of a chip based on the ARM1136JF-S core module that will be available from ARM in reference boards. The device, aimed at mobile and wireless applications, achieved greater than 40 percent savings in power consumption. Low-power design Within a device, power dissipation arises from two main sources: dynamic power consumption based on the device's switching activity, and static power dissipation that arises from the increased leakage current that is a consequence of advanced nanometer process technology that reduced the transistor's threshold voltage. The Silicon Design Chain team set out to prove that its low-power design system could dramatically reduce both dynamic and static power consumption due to leakage current on the ARM1136JF-S test design. Complementing alternative methods that propose potential savings through system-level power management methods, this low-power design approach applies circuit-level methods to achieve these savings without the need for highly specialized core capabilities. To validate the broad applicability of this approach, this test chip development effort used a typical process, the TSMC 90nm G silicon process, and ARM Artisan general-purpose physical IP, including SAGE-X standard cell libraries and memory generators. As described below, the standard cell libraries were augmented with extended voltage range characterization and cells aimed at enabling power reduction design techniques. Cadence Design Systems developed a low-power design methodology using version 4.1 of the Encounter digital IC design platform. Dynamic power reduction In this project, the design team first addressed dynamic power consumption, which can be represented by the equation: where K is toggle rate (the fraction of time that transistors are switching) C is circuit capacitance, including interconnect and transistor capacitance V is supply voltage to transistors F is operating frequency As this equation indicates, power is proportional to the square of the supply voltage. Consequently, designers can save a significant amount of dynamic power simply by reducing the voltage — an approach called voltage scaling. On the other hand, lowering the supply voltage slows transistor switching speeds. Because this design needed to perform to 350 MHz to meet the requirements of ARM's development partners, the team had to be selective in determining which parts of the design could use the voltage scaling technique. In this case, the team created a multi-supply voltage (MSV) design, partitioning the design into separate "voltage islands" or "voltage domains", where each domain operates at a different supply voltage depending on its timing requirements (Figure 1). Here, the team kept timing-critical blocks in one domain, operating at the standard 90nm supply voltage of 1.0V. Blocks with less critical timing paths were aggregated into a second domain, with the voltage scaled down to 0.8V — a 36% reduction in dynamic power for that portion of the design. Figure 1 — Separate voltage domains optimize power and performance In the past, the voltage domain approach introduced additional complexity during physical design, particularly for connecting the proper power supply and power network. Designers would typically need to manually insert special translation cells, called voltage level shifters, to convert signals between different voltage domains, along with clamp cells to provide isolation. Implementing these translation cells has posed challenges for insertion, placement and power connections. Furthermore, analyzing a MSV design across different voltage islands has also been a challenge, because traditional hierarchical modeling methods for each island might not be accurate enough for the advanced technology nodes. The ARM1136 core design had 3400 signals that went from the 0.8V to the 1.0V domains, requiring 3400 level shifters. In this flow, the Cadence Encounter design system automatically inserts level shifters into the design, drawing on ARM Artisan libraries that provide voltage level shifters and clamp cells. During this process, the design system connects these cells to the two power rails and optimizes their placement for timing, signal integrity effects on timing, and power routing. Furthermore, Cadence and ARM collaborated to create level shifters optimized for use with the Cadence Encounter NanoRoute routing engine. The level shifter design and the automation of their implementation into the ARM1136JF-S core design was a key enabler for achieving significant dynamic power reduction while still meeting aggressive schedule requirements. To further reduce dynamic power requirements, this low-power design approach also utilizes clock-gating techniques. In typical designs, individual registers are loaded with data relatively infrequently, yet the clock signal continues to switch at every clock cycle, which drives a capacitive load. With this technique, a gating circuit shuts off the clock for those registers that do not need to be loaded — an approach that can typically achieve a 10% to 20% savings in dynamic power. For this test chip, the Silicon Design Chain team used Encounter RTL Compiler to perform automated clock gating, using integrated clock-gating cells from the Artisan library. In this case, the automated clock-gating capability enabled the design team to gate 85 percent of the registers in the low-power chip. Also critical to the overall performance were low-power clock tree synthesis and high performance clock tree implementation. In addition, the ability to optimize the design across both voltage domains mitigated difficult timing closure challenges that were amplified by voltage scaling. Using this combination of specialized cells, automated voltage scaling and clock gating methods, the Silicon Design Chain team taped out a chip with a 37.9 percent decrease in dynamic power consumption. Static power reduction As designers move to more advanced technology nodes, they must contend with dramatically increased leakage currents. For a 130nm process with a 0.7V threshold voltage (Vt), leakage current is approximately 10-20 pA per transistor. With a 0.3V Vt, leakage current shoots to 10-20 nA per transistor, increasing exponentially in smaller geometries. Overall, leakage power grows from <5 percent of power budget at 0.25 microns to 20-25 percent at 130nm and as much as 40-50 percent at 90nm. In this low-power methodology, designers manage leakage power dissipation by using libraries that contain a matched set of logic cells that have different threshold voltages (Vt) and the same physical footprint. The cells with the higher Vt exhibit less leakage current than their counterpart cells with lower Vt. However, higher threshold cells also exhibit higher cell delay, degrading overall performance (Figure 2). Consequently, design tools need the ability to provide a netlist implementation that meets performance at the lowest possible leakage current, and to optimize for power, performance and area concurrently. Figure 2 — Cell delay increases with decreasing cell threshold voltage For this test chip, the design team used the ARM Artisan library, which provides cells with different voltage thresholds. The team first optimized the design during synthesis using Encounter RTL Compiler to meet the 350MHz performance goal while minimizing the overall leakage current. After place-and-route where more accurate parasitic information is available, the design then used SOC Encounter's post-route leakage optimization to fine-tune the leakage power and performance. The combination of cells with different voltage thresholds and automated and consistent design capability enabled the Silicon Design Chain design team to achieve 46.7 percent savings in leakage power. The combined savings from reduced dynamic power and reduced static power yielded an overall power savings of more than 40 percent (Table 1). Table 1 — Test chip power savings using the low-voltage design methodology Power analysis Along with a qualified set of IP components, this low-power design methodology relies on an implementation platform that can accurately predict and optimize performance across a wide range of voltage levels and operating conditions. The use of multiple power supplies within a single design introduces complications for timing analysis, because accurate delay calculation requires accurate delay models for each operating voltage. Furthermore, level-shifter and clamp cells must also be properly modeled to compute the aggregate delays correctly. Through their combined efforts, Cadence and ARM overcame this challenge by characterizing these components using the effective current source model (ECSM) to ensure accuracy across the multiple voltage domains. ECSM models the current drawn by transistors, rather than the voltage, as found in traditional modeling methods. With traditional methods, accurate modeling of cell delays for a particular voltage level to within a few percent of Spice requires creation of characterized timing views for that voltage level — an extensive, costly process. For example, to utilize six different voltage levels at three different process/temperature corners requires 18 separate timing library characterizations. In addition, most delay calculators and industry-standard timing analyzers support accurate delay calculation for only a single nominal voltage level. Using voltages other than nominal introduces excessive error due to linear derating that often exceeds 20 percent from Spice, particularly for the slower low-power cells. In contrast, the different operating voltages in an MSV design can be covered with ECSM models that are characterized at just three points across the voltage range. The ECSM based standard cell models used in the Silicon Design Chain chip achieved delay prediction that correlated, on average, to within 2% of Spice simulations (Figure 3). For this test chip, Artisan characterized its 90-nm libraries to support ECSM delay prediction, by providing lib_ecsm library views. Figure 3 — ECSM achieves accuracy within 2 percent of Spice simulations Design sign-off As with any design, accuracy is a primary concern during sign-off. With the added complexity of multiple voltage domains, this type of low-power design could present additional difficulties for sign-off analysis. For this test chip, the Silicon Design Chain utilized VoltageStorm and CeltIC NDC to provide the required accuracy. VoltageStorm analyzed the IR drop across the 1.0V and 0.8V power grids, confirming that each transistor in the design was operating with the requisite supply voltage. In turn, these voltages were used as input into the ECSM based delay calculator in CeltIC NDC (SignalStorm), providing near-Spice-accurate timing across the two supply voltage domains. This level of accuracy is particularly critical in this type of design, because the IR drop effect on timing is accentuated at the lower supply voltages used in MSV designs. Conclusion As semiconductor companies look to exploit emerging market opportunities, the need to address emerging nanometer design challenges becomes paramount. Through its collaborative efforts, the Silicon Design Chain member companies are tackling these tough cross-industry issues. By creating a comprehensive low-power design flow (Figure 4), the member companies have provided mainstream SoC designers with capabilities once available to only the largest semiconductor companies. Figure 4 — The low-power design flow The Silicon Design Chain engineering team has validated this low-power design approach in silicon in a high-performance device based on the ARM1136JF-S processor. Fabricated in TSMC's 90G process, the test device used a low-power design system comprising ARM's finely tuned physical IP products and modeling methodology, coupled with Cadence's Encounter low-power design flow. Together, this low-power solution will reduce the risk associated with moving to advanced process nodes for mainstream electronics product developers. Robert Aitken is the Senior Architect of Product Technology at ARM Physical IP, where he is responsible for memory architecture, design for manufacturability and design for testability solutions. George Kuo is technical director of Design Chain Initiatives at Cadence, responsible for leading technical projects with strategic partners. Prior to Cadence, he held various senior positions in high performance signal processor ASIC design and flow development at Hughes Aircraft Company and at Synopsys. Edward Wan is the senior director of design services marketing of TSMC North America. Before joining TSMC, Mr. Wan was CEO of Spike Technologies, a leading chip design services company in Milpitas, California. |