Under the Hood of Library IP (by Brani Buric and Mike Colwell, Virage Logic)

Little-known aspects of library design that can directly impact the time to market, design flow complexity, and NRE cost of your next project.

Design reuse via IP libraries is essential today to reduce design cycles in the face of time-to-market concerns, design complexity, high NRE costs, and new process technology challenges. Meeting time-to-market demands means disallowing excessive amounts of time squeezing every picosecond and square micron from your library. However, selecting the proper library in the first place—one crafted for a specific process and performance target—can result in chip area savings that can significantly reduce the final cost of silicon. In many cases, these savings will mean the difference between a profit or loss for the particular application

Currently, cell architecture is optimized for a routing methodology developed over 20 years ago. Metal 1 is routed horizontally, Metal 2 routed vertically, and Metal 3 horizontally (HVH). With more metal layers available in today’s processes, an alternative routing approach—Metal 1 routed vertically, Metal 2 routed horizontally, and Metal 3 vertically (VHV)—provides much better results with respect to area, performance, and power consumption trade-offs. (see Figure 1)

VHV routing is preferable to HVH routing due to the significantly better pin accessibility, which results in higher utilization. VHV leads to shorter wire lengths and a reduced number of signal vias, which in turn, improves performance and reliability while reducing power. In addition, VHV shows better distribution of power to cells, with better control of IR drop.

To take advantage of the VHV routing approach, the cell architecture must be changed significantly compared with the conventional architecture used with HVH routing. In addition to defining the cell design to better utilize routing for contemporary processes, many technical decisions must be made while defining the library architecture.

Metal routing
Frequently, IP vendors select Metal 1 and Metal 2 routing pitches that are not as tight as design rules allow to reduce the cost of library development. However, this approach results in loss of routing resources at the chip level. For example, if the routing pitches of Metal 1 or Metal 2 are 10 percent larger than the minimum pitch allowed by design rules, an approximately 5-percent loss of routing resource for those metal layers will result.

While it is important to keep the routing pitches tight on Metal 1 and Metal 2, it’s also essential to allow users to configure routing pitches on intermediate layers to tune for specific performance, power, and density requirements. For example, increasing wire spacing from single to double spacing on a typical 130- nanometer (nm) process provides more than a 30- percent decrease in routing capacitance, reducing power consumption and increasing design performance.

Large designs require a significant amount of routing resources for power, while IR drop problems are common in today’s designs. Different parts of a design typically have different performance and power requirements. To handle a variety of local power requirements, libraries are developed for the worst-case application (for example, high speed/ high power consumption) or for a “sweet spot” application. In either case, power rails for the library will either be too wide or too narrow, depending on local power requirements.

In the VHV architectural approach, the cell power rails are in Metal 2. Among the advantages is reduced IR drop due to better resistive characteristics in Metal 2 compared to Metal 1. Power rail widths can be defined to meet local IR drop requirements and designers can expand power rails using the power router within the place and route tool .

Architectural considerations
Well and substrate ties can take up substantial space, or cause significant loss in available transistor drive. In most cases, well and substrate contacts are implemented as a contact bar under the metal rails in the cell. This approach simplifies cell implementation, but incurs a large loss of transistor drive for a small density improvement. An alternative approach is to share contacts between adjacent cells, but unfortunately this approach demands much more physical implementation time. For this reason alone, most library developers select the simpler approach at the cost of 8-to-10 percent of total transistor width.

Selection of the P/N ratio is important because it represents the trade-off between performance and routability. Even a minor adjustment of the P/N ratio can achieve a significant improvement in routability, but it’s important to note that any change in the P/N ratio requires complete circuit re-optimization. For best performance and density, both transistor-level topologies and transistor-sizing methodology must be optimized on an individual cell basis. Electrical parameters such as transistor characteristics, poly, diffusion, and metal capacitance and resistance will vary from foundry to foundry or between different technology nodes on the same foundry.

For high-drive cells, care must be taken to ensure that all internal wiring is wide enough and that contact/ via count is high enough to handle the maximum current. It’s critical to determine which metal segments and contacts/vias will carry uni-directional current, and then to size them appropriately. In the latest technologies, bi-directional or RMS current for very high drive cells need to be considered and multiple via drop locations made available at the cell output pins, so the design can be routed without electromigration problems.

Individual cell optimization must be performed in an environment similar to the real design. When optimizing each cell, the cell should be placed in a path so that effects of input capacitance and output drive, as well as internal sizing, will be accounted for while determining transistor sizes. This path should include appropriate loading for the target technology on each stage in the path. The average propagation delay across the path should be used as the target design criteria. However, checks should be made to ensure that the rise and fall time ratios are not extreme, and that noise margins for all cells are within acceptable ranges. Within this environment, different design criteria can be used for different cell types, allowing maximum performance for each cell type.

Optimization of single-stage cells consists mostly of a P/N ratio variation. In many cases, the P/N ratio that gives maximum performance for an individual cell will be the same as that set for the library. Often, a small change in P/N ratio can lead to a large reduction of dynamic power with only a small change in performance. These trade-offs should be explored, especially for low power libraries where power reduction is essential. (see Figure 2)

Multi-stage cells
More variables come into play with multi-stage cell design. Generally, the output transistor sizes will be fixed by the basic architecture, but both the drive and P/N ratio of the input transistors can be modified to provide maximum path-level performance and/or power. One important consideration in multi-stage cell design is how to determine the appropriate transistor sizes for the input stage while the output stage drive is increased.

For example, consider an 8x drive implementation of a 2-input AND gate. If a 2-input NAND gate is implemented with a 1x drive, the cell will be very small, but the rise/fall time on the intermediate node will be very slow and the intrinsic delay of the cell will be long. At the other extreme, if the input stage is implemented as an 8x drive, the intrinsic delay of the cell will be small. However, not only will the cell be very large, the input capacitance will significantly slow down the driving stage and the path performance will be reduced. In either of these cases, the synthesis tool will tend to use a 2-input NAND gate and an inverter in place of the 2-input AND gate, which will increase both cell and net count and increase routed chip size—a phenomenon commonly known as “pull-apart.”

The optimum input size lies somewhere in between the two extremes. Consider the concept of a “build-up ratio”— the ratio at which the input transistor sizes grow as the output drive is increased. The optimum ratio depends on whether the library is targeted for high performance or low power. The general trade-offs pit a larger buildup ratio, which provides the best performance against both an increased cell size and increased power consumption. Therefore, a high performance library will use a larger build-up ratio, and a low power library will use a smaller ratio.For complex cells such as multiplexers and exclusive-OR gates, both transistor-level topologies and transistor-sizing methodologies must be explored to find the best solution for the library’s target. For complex cells, the range of transistor topologies available is much greater. For example, a high performance library would use a topology that provides fast timing, but isn’t the most efficient in terms of area or power. Alternatively, a low power library would use a different topology where cell size and dynamic power are low. (see Figure 3)

To enable high-performance adder and multiplier blocks, special function cells such as full and half adders, compressors, and booth encoder cells should also be included in each library. In many cases, multiple implementations of these cells with different design targets are necessary in order to achieve the highest performance on a functional block level.

Key elements for flip-flop design
Flip-flops play an important role in many aspects of a design, and the design of flip-flops being most critical to the area, performance, and density of all designs. The following are key elements that need to be considered: Area - In most designs, flip-flops will take up 30-to-50 percent of the total logic area. The impact on cell size must be considered in all design and transistor size choices.

Timing - Flip-flop timing affects all paths, which start with a clock-to-q timing and end with a setup time constraint. Hold-time violations also cause complexity in design flow, and increase in area due to delay cell insertion. Care must be taken on clock to q, setup, and hold-time design to achieve maximum performance as well as efficient design flow.

Signal integrity - Clock-pin glitches can cause the incorrect state to be latched and cause logical failures. As crosstalk-induced noise becomes more of a problem in newer technologies, clock pins must have sufficient noise margins.

Power - For robust design, clock pins on flip-flops must be buffered in modern technologies. However, the power consumed, or affected by, the transistors driven by the clock pin can attribute to a majority of the power consumed in the chip. Internal to the flipflop, clock buffers and transmission gates will switch with every clock cycle, while logic switching factors will range from 10-to-20 percent. Also, the clock tree driving the flip-flop clock pins will be affected by the input capacitance of the clock pin. Larger input capacitance requires higher drive clock buffers and larger clock buffer tree depth, further increasing the clock related power consumption.

Clock tree - Significant wiring resource and clock buffer cell area is used in creating low skew clock nets. Flip-flop clock buffer input capacitance should be kept constant across all flip-flop types to provide equivalent capacitance targets for the clock tree synthesis tool and reduce clock skew as well as clock tree depth.

Conclusion
Clearly, the challenges of selecting a library are just as complex as the job of designing the cells. The knowledge of the physical and electrical characteristics of the cells can affect the area and performance of the final design implementation. Ultimately, the savings resulting from the best implementation choicecan mean the difference between a profit or a loss for the particular application.

Bio
Brani Buric is senior director of product marketing at Virage Logic. Previously, Buric held senior management positions in marketing, business development and applications engineering with In-Chip, Sycon Design, Avant!, Meta Software, Mentor Graphics and Silicon Complier Systems. Before moving into the EDA industry with Silicon Compiler Systems in 1987, Buric held engineering and product management positions with Burroughs.

Mike Colwell is director of engineering at Virage Logic and responsible for leading the development of the company’s logic product line. Previously, Colwell was vice president of engineering at In-Chip and held various engineering and management positions at LSI Logic. Colwell holds 11 patents in the area of standard cell, gate array, and I/O libraries.

Under the Hood of Library IP (by Brani Buric and Mike Colwell, Virage Logic)

Contact Virage Logic