Optimizing clock tree distribution in SoCs with multiple clock sinks

Alberto Ferrara and Pierpaolo De Laurentiis, STMicroelectronics
Embedded.com (March 10, 2013)

In the design of high-performance high-speed integrated circuits, clock tree organization is fundamental to distribution of e-clock signals to the whole area of an integrated circuit or to a predefined part of it.

In this article we describe a structure and a method for propagating clock signals to a multiplicity of clock sink nets in a system-on-chip (SoC) design. We include an improved buffering and wiring apparatus that allows reduction of the number of clock stages, the overall latency, the clock skew, and uncertainty.

The problem of clock distribution from root (PLL) to sinks (FlipFlops) is addressed, using two phases: (1) top level optimal distribution and (2) local or block based clock distribution. A method for integrating the two phases within an automation system is also described.

The role of clock trees in complex SoCs
The growing complexity of integrated circuit design is leading to several requirements for bringing a layout to completion. Modern technology nodes (32nm and beyond) are challenging because their reduced physical geometries introduce uncertainties due to local variation (random effects) and the impact of parasitics in terms of wire capacitances and resistances.

These variations are typically called on-chip variation (OCV). There are two source classes of variation that must be considered in design: global and local. Global chip-to-chip variations cause performance differences among dies and are modeled as operating corners. Local on-chip variations cause performance differences among transistors within the same die and are modeled as an added derating factor to get skew calculations.

OCV derating is calculated as a certain percentage of the total insertion delay. Consequently, in order to optimize performance of the clock tree, designers need also to take into account structures that are inherently not prone to OCV, while minimizing the overall latency. In this context one of the things that must be considered is the impact of automated methods for the computer-aided creation of a layout of an optimized clock tree circuit. This is of crucial importance because existing software tools tend to arrange the clock tree unfavorably for OCV and latency.

Standard Clock Tree Synthesis engines are driven by timing closure and, hence, are not PVT (process/voltage/temperature) variation aware. They are used to fix setup/hold violations by adjusting the clock skew, adding, removing, and swapping buffers, or exploiting different clock wire lengths and levels and so on. As a result, the skew sensitivity with respect to PVT variations cannot be kept low, since it has several contributors originating from different physical phenomena.

Click here to read more ...