Implementing Different Power Features in an IP
By Sayandeep Nag, Chandrashekar BU, Synopsys, Bangalore, India
Prathima KD, Manipal Center for Information Science, Manipal, India
Abstract:
One of the challenges for present SoC designers is to ensure that their SoCs consume least power. Since almost all SoCs use a set of IPs, it’s important for the IP providers to give different power reduction options in their IPs, enabling the SoC designers to design a power optimized chip. This paper primarily focuses towards IP design and verification engineers and lists some useful power reduction features that can be implemented in an IP.
The USB IP core is used as an example to demonstrate how the USB functionality can be segregated into different functional states and how a combination of power features can lead up to 97% reduction in total power and 96% reduction of leakage power.
I. INTRODUCTION
This paper focuses on methods of defining power features based on the functionality of the design. It explains how power states can be defined and how different power features can be implemented for different power states in order to achieve higher power savings.
The USB IP is used as an example throughout this document. However, all the power features explained are generic and they can be used as a good reference for implementation in both data path and connectivity IPs.
This paper covers the following topics:
- Section II: Need for Power Optimization in an IP
- Section III: Defining Power States for an IP
- Section IV: Introduction to Power Features
- Section V : Verification of Power Features
- Section VI: Defining Power States in a USB IP
- Section VII: Implementing Features for Power States
- Section VIII: Conclusion and References
II. NEED FOR POWER OPTIMIZATION IN AN IP
IP blocks form a crucial part of a SoC design. The IP design and verification must be robust so that SoC designers can focus on other crucial areas of the system. With the present need to conserve energy, IPs must support power savings to help design a power optimized system. These features should not be restricted only to insertion of clock gating cells by the synthesis tool. The scope of these options needs to be extended to have a design which can move into a low power state on its own by sensing the different operational modes of the IP.
The total power consumed by a chip equals dynamic power plus static power. Dynamic Power is the power consumed in switching logic states, both internal to the cells (internal power) and for driving the chip’s nets and external loads (switching power):
Dynamic power = CV2F
Where:
- C is the load,
- V is the voltage swing and
- F is the number of logic-state transitions.
As semiconductor structures become smaller, device and interconnect capacitances decrease allowing for higher performance and lower power. Countering these factors are power increases because of larger designs and higher switching rates. Thus the dynamic power should still be optimized by the IP designer. Static Power (leakage power) is consumed while transistors are not switching:
Static power = VISTAT
In lower geometries, Static Power is significant. In order to reduce the overall power consumption of a design, both static and dynamic power must be controlled by different features based on functionality. The IP designer needs to identify functional modes where the design can be transitioned to a low power state. The definition of the power state should also identify whether static, dynamic or both can be saved in that state.
A low power state is defined as a state when the design consumes less power as compared to normal operational modes. The design should automatically be able to exit this low power state when required. Such a design, which is capable of identifying when to save power, can be defined as a power aware design. A power aware design can also have a power aware application flow, which can put the design into one of the low power modes. To ensure that the power aware design and application flows work as desired, a power aware verification flow needs to be developed. This is addressed in detail in the references [1], [12]. A power aware design enables a SoC designer to choose from a set of features based on their design and power saving requirements without making drastic modifications to their existing SoC implementation flows.
Until recently, the scope of power savings in an IP was restricted to having tool driven power saving methodologies such as clock gating insertion by the synthesis tools. Clock gating is an efficient feature and has its advantages (described later in this paper); however the present scenario requires power savings to be built into the design by considering various options based on functionality.
The designer needs to identify all opportunities where power can be saved and then implement relevant power features. This method additionally may require modifications in the application flows enabling it to decide when the core is not functional and can be put into a low power state.
III. DEFINING POWER STATES FOR AN IP
We need to understand the functional role of an IP in a system before defining operational modes where power can be saved. These operational modes can also be termed as power states. Defining power states helps to identify the power features that can be implemented based on the functional requirements of a particular state. Figure 1 is a simple illustration of how power states can be defined. In this example, the states are segregated based on the type of power that can be reduced in a particular functional mode. In some cases, each of the power states can have sub-states.
Figure 1 : Defining Power States
Power State 1 defines a functional more where dynamic power can be reduced. Similarly Power State 2 defines a functional mode where leakage power can be reduced. Power State 3 defines a functional mode where both dynamic and leakage power can be reduced. To bring in an additional level of granularity for power savings, each power state can also have additional sub-states.
Power State | USB IP | Ethernet Switch [11] | PCI Express |
State 1 | Full Speed Transactions on the Bus | Low Link Utilization | Low Link Utilization |
State 2 | No device / Host connected | Connected Ports are Idle | Connected link is Idle |
State 3 | USB SUSPEND State | Connected Ports are Idle | Connected link is Idle |
Table 1 : Implementing power states in different IPs
Table 1 above gives an example of how power states can be defined in different IPs.
IV. INTRODUCTION TO POWER FEATURES
This section lists and describes some power saving options, some of which aim towards reducing the dynamic power and some towards reducing leakage power. Though we have used an USB IP to demonstrate the power saving features, these features are generic and can be implemented for most connectivity and data path IPs or such designs in general.
Dynamic Switching of Application Clock:
High-speed USB can transmit up to 480 Mbps of data. This high data transfer rate is primarily required only for real time applications; it may not be required for normal file transfer operations between host and device. Based on the type of transaction on the bus, the application can decide to reduce the application interface clock (such as AHB or AXI) to a lower frequency. This reduction in application clock frequency does not require any additional logic change internal to the IP. The SoC designer must ensure that different application clock frequencies are available to the application interface of the IP. Additionally, the IP designer must define an application flow required to implement this feature in the SoC.
Figure 2 illustrates the sequence of steps for clock switching during low bandwidth requirements for a USB IP.
Figure 2 : Sequence of Operation for Dynamic Switching of Application Clock
In order to switch clocks, the application must ensure that it is not going to initiate any Transmit transactions on the bus and that it is not going to respond to any Receive transactions until the core switches to the low frequency clock and is completely stable. Once the clock has stabilized the transactions, Transmit and Receive operations can resume. Thus reduction of clock frequency based on functionality helps in saving dynamic power.
Frequency Scaling of Application and Interface Clock:
Frequency scaling reduces dynamic power consumption by reducing switching rate of signals. This feature can be implemented when there is reduced activity from the application and also on the interface. In USB there is no activity on the bus during SUSPEND [6] ; both the host and device core can take advantage of this inactivity to save power. During this SUSPEND cycle, the host must be able to detect the remote-wakeup initiated by the connected device and the device must be able to detect resume or reset initiated by the host.
To have this logic functional, a normal clock used for high frequency operations is not required. We can reduce the application clock from MHz range to KHz range thereby reducing power consumption without violating the USB standard requirements. The KHz Clock Suspend mode enables the PHY clock and the application clock to be multiplexed externally with a slower clock during low power state. This configuration enables power savings while still meeting the functional requirements for a device remote wakeup or a host resume. Figure 3 illustrates the sequence of operation for implementing frequency scaling for both the application and the interface clock.
Figure 3 : Sequence of Operation for Frequency Scaling of Clock
The sequence of operation is similar to dynamic clock switching where the transactions on both transmit and receive need to be halted before switching the clocks. This feature also requires external clock multiplexing along with modifications to the internal logic of the IP to change the counter and timer values used for interface level signaling. This feature also helps in reducing dynamic power.
Partial Power Gating:
Power gating is an effective technique for reducing leakage power. Here, circuit blocks that are not in use are temporarily turned off to reduce the overall leakage power of the chip. This temporary shutdown time leads to a low power state. When circuit blocks are required for operation once again, they transition to an active state. The switching between low power state and active state is done at an appropriate time and in a suitable manner to maximize power savings while minimizing impact to performance.
When the USB is in SUSPEND or the session is not valid, the power supply to most modules can be turned off. Some logic must be live to detect the resume, remote wakeup, session request protocol, or new session start events and then wake up the core. Two power rails are required to implement partial power gating logic:
- VDD_CTL for the logic which can be turned off during low power state and
- VDD_WKUP for the logic which is on during low power state for detection of remote wakeup, resume, or reset.
Figure 4 illustrates the partial power gating sequence. Implementation of this feature requires the following changes:
- Modification to internal IP to support signal clamping and restoration
- Application flows to support this feature
- Two power rails to power gate modules during low power state.
Figure 4 : Sequence of Operation for Partial Power Gating
Before the core enters SUSPEND state, the following operations must be completed:
- Signals crossing power domains must be clamped,
- Reset to the power down blocks must be asserted so that when the blocks come out of SUSPEND they are in a known state.
- The applications and the interface clock must be switched off
- The VDD to the power down blocks must be removed.
The sequence to wake up the core is simply a reversal of the above procedure.
Complete Power Gating:
Partial power gating has one limitation - it has two power domains, one always on and other powered down. Having two power domains could result in many signals (including clock crossing signals) crossing between the two power domains causing complexity in placement and timing closure. Complete power gating or hibernation (where the core is shutdown completely) addresses this limitation. During hibernation, the core application is informed about remote wakeup, resume, reset, connect, disconnect, and session request protocol by a very low gate count logic external to the core. Because the gate count of this external logic is almost 3% to 4% of the gate count of the complete core, the clock balancing problem is simplified. The application can completely control this logic.
The hibernation (or complete power gating) sequence is similar to the one shown in Figure 4. Additional details are provided in figure below.
Figure 5 : Sequence of Operation for Complete Power Gating Sequence
An additional restore step ensures that the states of the core before entering SUSPEND and after coming out of it are identical.
Tool Based Power Saving Features
Functional power savings features are important, but the tool driven features also play an important role. Tool based features achieve a level of granularity which is difficult to achieve in design or implementation. Some of the tool based power features are defined blow. These features are supported by most of the simulation and synthesis tools.
RTL Clock Gating:
This feature is a tool driven synthesis methodology. RTL clock gating reduces dynamic power consumption by causing inactive clocked elements to have clock gating logic automatically inserted. This reduces the power consumption on the inactive clocked elements to zero when the values stored by those elements are not changing. The RTL clock gating feature allows easily configurable, automatically implemented clock gating that allows maximum reduction in power requirements with minimum designer involvement and no software involvement [10]. Clock gating can also be implemented as an RTL feature by adding clock gating logic and application controllable clock gating control signals. This provides additional flexibility to the application to gate the clock.
Unified Power Format (UPF):
Verification and backend implementation of a power gated circuit at all levels is a challenge because HDL does not provide a mechanism for describing power connections at RTL level. To simulate power gating, we need to extend Verilog either by modifying the code or by using a separate set of commands to describe power connections and power switching.
The Unified Power Format (UPF) [3] defines both language format and semantics to convey the power intent information of a design to the simulation, synthesis and backend tools.
A UPF specification defines the following:
- How to create a supply network to supply power to each design element,
- How the individual supply nets behave with respect to one another, and
- How the logic functionality is extended to support dynamic power switching to these logic design elements.
With the above features UPF ensures that the implementation tools are aware of the intent of the design enabling us to get higher power savings as highlighted in Table 4 and Table 5
V. VERIFICATION OF POWER FEATURES
Verification of the power features is as important as the design of the features [12]. The verification environment also needs to be aware of the designs’ capabilities to transition to low power states and drive the design accordingly. In addition to normal verification requirements some of the verification challenges would be:
- To verify that the design cleanly enters and exits a low power state.
- There is no corruption of signals crossing power domains during power gating.
- The sequences of steps for the design to enter and exit a low power state are well defined.
VI. DEFINING POWER STATES IN A USB IP
The USB port of any system is used in bursts, i.e. it is not functional for the entire duration of operation of a system. For example, in a cellular phone the USB is primarily used for transfer of data, which can be real time or file transfer. Because transfer of data is not necessarily a continuous process, there are many cycles when the bus is idle. These idle cycles give ample opportunities for a power aware design and its application to save power. The USB standard also supports power saving features like Link Power Management (LPM) [8].
The USB functionality can be broadly segregated into three power states:
- STATE 1: The bus is fully functional and there is continuous data transfer.
- STATE 2: The bus has transitioned to an LPM defined state.
- STATE 3: The bus has transitioned to SUSPEND [6] due to 3ms of inactivity.
VII. IMPLEMENTING FEATURES FOR POWER STATES
After defining the power states, we can now define how the power features introduced in the previous sections can be used to achieve maximum power reduction. For each power state, we have provided a comparative analysis of the power saved for a sample USB IP design with a gate count of under 90K gates targeted to tsmc65lp library.
STATE 1: The Bus is fully functional
In this state, transactions are active both on the application and the interface side. It is not possible to reduce the leakage power, but we can definitely reduce the dynamic power. We can reduce dynamic power by using the following features:
- RTL clock gating
- Dynamic switching of application clock.
Table 2 shows the comparative power savings achieved when a combination of the above features are implemented for STATE 1. It can be observed that reducing the clock frequencies or gating the clock when the design is non-operational can save substantial amount of power. The power reduction is primarily due to static power reduction and there is no change in the leakage power of the design.
Feature Name | Power After feature Enabled | % Power Reduction |
Power Before Feature Enabled = 3.062 mW | ||
RTL Clock gating | 1.115 mW | 63 % |
Dynamic switching of application clock | 1.05 mW * | 65 % |
RTL Clock gating & Dynamic Clock Switching | 0.56 mW * | 81.7 % |
Table 2 : Comparison of Power Savings in STATE 1
* The application clock has been switched from 100 MHz to 30 MHz
STATE 2: The bus has transitioned to LPM defined state
In this state, the USB transitions to a state defined by the LPM standard [8]. There is no activity on the bus but the USB core is able to enter and exit the LPM state within a very short period of time. Because of this requirement, it is not possible to implement leakage power saving features such as partial power gating or complete power gating. A good option is to use a combination of the following:
- RTL clock gating
- Dynamic switching of application clock.
- Frequency scaling of application and interface clock.
Table 3 shows the comparative power savings achieved when a combination of the above features are implemented for STATE 2.
Feature Name | Power After feature Enabled | % Power Reduction |
Power Before Feature Enabled = 3.062 mW | ||
Frequency scaling of application and interface clock. | 2.31 mW | 24 % |
RTL clock gating & Frequency scaling of application and interface clock | 1.00 mW | 66 % |
Table 3 : Comparison of Power Savings in STATE 2
STATE 3: The bus has transitioned to USB SUSPEND state.
The USB SUSPEND state is an ideal choice for implementing power gating features for reducing leakage power. The standard allows ample time for the core to enter and exit the USB SUSPEND state.
Power gating requires at least two power rails so that the VDD to the unused logic can be switched off. The remaining logic detects activity on the bus and wakes up the switched off logic when required. It may not be feasible for all SoC designs to implement two power rails. In this state, we consider both dynamic and switching power reducing techniques. The following features can be implemented to reduce switching power:
- RTL clock gating
- Dynamic switching of application clock.
- Frequency scaling of application and interface clock.
The power figures for the above features remain same as given in Table 2
The following features can be implemented to reduce leakage power:
- Partial Power gating
- Complete power gating
- Power gating with UPF
Table 4 shows the comparative power savings of the total power reduction techniques only during the SUSPEND cycle to highlight the amount of power saved.
Feature Name | Power After feature Enabled | % Power Reduction |
Power Before Feature Enabled = 0.6 mW * | ||
Partial power gating | 0.1 mW | 83 % |
Complete power gating | 0.03 mW | 95 % |
Partial power gating with UPF | 0.085 mW | 85 % |
Complete power gating with UPF | 0.017 mW | 97 % |
Table 4 : Comparison of Total Power Savings in STATE 3
* The Power Numbers are for SUSPEND cycle only
Since Power gating reduces leakage power, we would like to highlight the leakage power savings in Table 5. It is important to note that the leakage power savings for complete power gating numbers is less than the partial power gating; this is due to the additional external logic implemented to for bus activity detection. This is an overhead required to achieve total power savings as mentioned in Table 4
Feature Name | Power After feature Enabled (uW) | % Power Reduction |
Power Before Feature Enabled = 6.2 E-06 uW * | ||
Partial power gating | 1.39 E-06 | 77 % |
Complete power gating | 1.42 E-06 | 77 % |
Partial power gating with UPF | 7.63 E-07 | 87 % |
Complete power gating with UPF | 2.57 E-07 | 96 % |
Table 5 : Comparison of Leakage Power Savings in STATE 3
* The Power Numbers are for SUSPEND cycle only
VIII. CONCLUSION
This paper gives an approach of defining power features based on the functionality of the design. It defines how power states can be identified and how different power features can be implemented for the power states to achieve maximum power savings. Though we have considered USB IP as an example design to implement the features and highlight the power gains, all the power reduction options are generic and they can be implemented for both data path and connectivity IPs. This paper also defines how power features can be combined based on power state and increase the power savings. This paper could be extended further with sharing the details & the challenges in implementation of complete power gating for an IP both at the design and application level such as hibernation.
References
[1] S.Jadcherla, J. Bergeron et al., “Verification Methodolgy Manual For Low power,” Synopsys Inc, Feb. 2009, ISBN 978-1-60743-413-9
[2] Keating M., et al, Low power Methodology Manual for System-on-Chip design, Springer 2007, ISBN 978-0-387-71818-7
[3] Unified Power Format Standard, Version 1.0, Accellera
[4] Flynn D. et al, “ Design for Retention: Strategies and Case Studies,” SNUG San Jose 2008
[5] Jadcherla S., “Off by Design Architectures Curb Energy waste” SCD source, march 25, 2008
[6] USB Standard, Universal Serial Bus Specification, Rev 2.0, April 27-2000
[7] HS-OTG standard, On-The-Go and Embedded Host Supplement to the USB Revision 2.0 Specification, Rev 2.0, May 8- 2009.
[8] USB 2.0 Link Power Management Addendum.
[9] Energy Star Program, http://www.energystar.
[10] Power reduction through RTL cock gating, Frank Emnett and Mark Biegel, SNUG 2000.
[11] Inefficient Ethernet wastes over $1bn a year, Bryan Betts.
[12] Low Power Verification of Connectivity IP cores - a USB HS-OTG Case Study, IP-ESC 2009.
|
Synopsys, Inc. Hot IP
Related Articles
- Viewpoint: Opportunity to win on different design fronts
- Implementing Power Management IP for Dynamic and Static Power Reduction in Configurable Microprocessors using the Galaxy Design Platform at 130nm
- BCD Technology: A Unified Approach to Analog, Digital, and Power Design
- Reducing Power Hot Spots through RTL optimization techniques
- Power analysis in 7nm Technology node
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
- Dynamic Memory Allocation and Fragmentation in C and C++
- Scan Chains: PnR Outlook
E-mail This Article | Printer-Friendly Page |