IP and system design lower data centre power consumption

By Arif Khan and Osman Javed, Cadence

Studies have shown that servers, storage, and networking equipment typically run in low-utilisation conditions. At the chip level, there are techniques to increase energy efficiency, for example virtualisation and the use of key interface IP (intellectual property).

Minimising energy waste

For cloud computing services, virtualisation targets under-used servers. The process reduces energy waste by allowing a single computer to run multiple guest OS by abstracting the hardware resources (CPU, memory, I/O) through a hypervisor, also called a virtual machine manager.

In a virtualised system, a guest virtual machine (VM) can be migrated from one hardware system to another. If the VM runs out of memory or other resources during a peak usage period, it can be migrated to an under-used server. It does, however, come with the risk of under-utilisation at non-peak loads.

System solutions

Processor cores, memories, disk drives, and the I/O network are the most power-hungry components inside a server. With multi-core machines, data centre operators can increase efficiency of threads/W and also reduce the cost/ unit of performance. Multi-core designs demand very high-bandwidth coherency links between sockets to maintain consistent data between processors. These designs also need proper bandwidth in the I/O subsystem to feed the inputs and outputs of the system (Ethernet, storage).

There are various techniques available to lower power consumption in data centre hardware. For example dynamic voltage frequency scaling (DVFS) lowers power while the CPU is active. As process technology evolves and the gap between circuit nominal voltages and threshold voltages shrinks, DVFS will see its advantages also shrink.

Applying active low-power techniques to the memory and I/O subsystem is another option. CPUs have a dynamic range greater than 3.0X (i.e. the power varies 3X over the activity range, making it fairly proportional to the usage). Memory, on the other hand, has a dynamic range of 2.0X, and storage and networking has a dynamic range of 1.2-1.3X.

Figure 1: Average power declines almost linearly with utilisation, but performance-to-power efficiency degrades much faster [Barroso and Holzle].

For memory, self-refresh assist can lower power consumption by an order of magnitude. This technique allows DRAM refreshes while the memory bus clock, phase-locked loop, and DRAM interface circuitry are disabled. In idle and peak-usage modes, interface links can consume a lot of power.

Among the interface protocols, PCI Express (PCIe) is ubiquitous in storage, graphics, networking, and other connectivity applications. The PCI-SIG has updated the PCIe protocol with engineering changes to help increase the dynamic range for power consumption on PCIe devices, based on activity and utilisation. As a result, the systems gain better energy proportionality. Some important changes to note include latency tolerance reporting (LTR), optimised buffer flush/fill (OBFF), and additional reduction of power in the L1 state.

LTR allows a host to manage interrupt service by scheduling tasks intelligently, in order to optimise the time it stays in low-power mode. With OBFF, the host can share system-state information with devices. The devices can then schedule their activity and optimise the time spent in low-power states. Typically, PCIe devices are not aware of where central resources are in terms of power states. So, the engineer cannot optimally manage CPU, root complex, and memory components because device interrupts are asynchronous, fragmenting the idle window. OBFF allows devices to receive power-management hints, so they can optimise request patterns, knowing they can interrupt the central system. This allows the system to expand the idle window and stay in a lower power state longer.

The PCI-SIG has redefined the L1 state as L1.0 with two sub-states, L1.1 and L1.2. These two states mean that the standby state can reduce power consumption in its mode. In L1.1 and L1.2, detection of electrical idle is not required and the states are controlled by CLKREQ. L1.2 further cuts down the power by turning off common-mode voltages.

Figure 2: Virtualised servers support increased utilisation on a common hardware platform through the use of hypervisors.

Lowering power with IP

Interface IP provide coverage for process, voltage, and temperature variations can reduce active-mode Fele PHY power. Clock gating and power islands can reduce leakage current substantially, which optimises static power consumption.

Cadence has optimised its controller and PHY for PCIe to support new L1 power-saving states. Implementation of these low-power states is available in x1 to x16 configurations. Key features include 95% link utilisation in Fele Gen3; decision feedback equalisation architecture, providing noise tolerance and low jitter to support robust designs; L1 substates implementation for 30% power improvement over competitive solutions; and bifurcation options for controller and PHY to be used in area-optimised, low-power configurations.

Virtualisation can support efficient system utilisation and operation, though most systems persist in operating in a sub-optimal low-utilisation space.

www.cadence.com