NVM OTP NeoBit in Maxchip (180nm, 160nm, 150nm, 110nm, 90nm, 80nm)
SOC: Submicron Issues -> Integrated approach lets new tech in
Integrated approach lets new tech in
By Mike Newman, Technical Marketing Manager, Magma Design Automation Inc., Cupertino, Calif., EE Times
October 16, 2000 (3:55 p.m. EST)
URL: http://www.eetimes.com/story/OEG20001016S0056
Tools that integrate clock tree synthesis with logic synthesis, placement, route and interconnect extraction will have the power to maximize the potential of new process technologies in cell-based flows.
In the increasingly complex world of deep-submicron (DSM) process technologies, millions of objects are at the designer's disposal in a single design. In 0.25-micron process technologies, only large design teams using semicustom and custom tools attempted these designs. Today, these large designs are implemented in design flows with standard cell synthesis and automatic place-and-route tools.
ASIC design teams would like to take advantage of the latest process technologies to implement very large designs with significantly smaller design teams. Leading-edge designs need the performance gains of new technology, and also need to get to the market quickly. New design automation tools are changing the DSM design landscape by enabling designs to meet their market window expectations. These emerging tools are opening doors to new horizons by merging logical and physical design processes into a single tool, creating the possibility of new design paradigms. One of the most significant ramifications of this merger can be found in the implementation of clock trees.
Ever since logic synthesis started to dominate logic design entry, there has been a partitioning of design responsibilities between the functional logic timed by clock trees and the physical implementation of clock trees. This partitioning was due partly to the tools that were used to implement these design functions, and the expertise needed to run the tools. These "front-end" and "back-end" groups used clock-timing constraints in a manner that partitioned the clock design into logical and physical elements. However, with DSM designs, the physical and logical worlds are merging, and this partitioning is no longer a reality.
The clock specification for a DSM-class design ha s become more complex, due to DSM effects. A simple synchronous design with a single clock would need a clock specification that goes far beyond timing. Some issues to consider are described as follows:
- Clock frequencies are increasing, which produces shorter cycle times, which, in turn, result in less time for the clock skew budget. At the same time, the number of clock loads is increasing, which results in increased skew. The overall effect is that meeting skew requirements can become impossible if skew is a percentage of cycle time.
- Clock skew is the difference between the insertion delay from the root of the tree to each end point. Clock skew is difficult to control in the context of historical budgets, partly due to increased frequencies, loads, and larger die size. In addition, the increased use of IP and RAMs produces multiple core obstructions, making it more difficult to implement symmetrical clock trees.
- Clock insertion delay is a specification that affects input and o utput ports. The clock on I/O port registers usually has a unique skew specification related to external timing specifications and requires guard bands. It also tends to be tighter than the skews needed by the functional logic contained within the core, which typically doesn't require guard bands.
- Clock jitter usually comes from a phase-lock loop (PLL) implementation and is the specification of cycle-to-cycle edge uncertainty.
- Gating logic is used to power down logic in order to address power-consumption design constraints. Delays through these gates contain guard bands that affect the accuracy of insertion delay and skew calculation. Further, these gates have complicated the clock structure with multiple tree branches that must have balanced skews.
Test clock multiplexing logic introduces the delay issues that are similar to gate delay. However, a MUX in the clock paths implies multiple modes of operation. MUXes in the clock path are usually implemented with one clock selected in functional mode and the other in test mode. This implies that different skew relationships exist between functional and test modes, further complicating the clock skew picture.
Crosstalk can affect clock performance as well as logic function and timing. Crosstalk adds to these skew complexities by affecting net and cell delays of the clock trees. The cross-coupling capacitance between clock nets and adjacent signal nets is not symmetrical across the entire tree, which makes it difficult to implement a fully symmetrical clock tree. The full impact of the cross-coupling capacitance cannot be taken into account until the functional logic and clock logic are placed and routed.
In addition to power dissipation by the structure of the tree, issues such as AC electromigration and self-heat (joule heating) have a measurable impact to the physical reliability of clock trees. There is a balance between the number of levels in the clock tree, and the performance and reliability of the clock tree. This balance is a function of the current characteristics of clock driver cells, the physical interconnect technology and the number of loads in the clock tree, and the clock frequency.
These effects are making it difficult to implement clock trees, without taking into account the nets and cells that surround the clock tree environment. With standalone point tools such as synthesis, clock tree generators, extractors, delay calculators, static timing and signal integrity analyzers, it can be difficult to take into account all the subtle and critical DSM interconnect effects.
The amount of time it takes standalone point tools to analyze the impact of interconnect effects and then implement corrective action on the design is becoming prohibitive with respect to time-to-market constraints. Therefore, automation is required to consider how all of these issues affect the class of designs that are being implemented in DSM today.
Current chip design flows consist of standalone point tools: synthes is, place and route, clock tree generators, extractors, delay calculators, static timing and signal integrity analyzers. With current approaches, it can be difficult to fully appreciate the subtle interconnect effects that occur at process technologies of 0.18-micron and below. These interconnect effects modify all of the clock constraints defined above and are becoming more than just commonplace-they are becoming critical.
Emerging EDA approaches directly address DSM effects during clock tree implementation by integrating all of the design functions of synthesis, place and route, timing and signal integrity into a single tool using a single, unified data model. This unified system enables a clock methodology of incremental refinement, which is the correct-by-construction approach. Starting with front-end constraints such as timing or skew, a clock tree can be constructed using collaboration between the design functions.
The final solution is achieved by gradually reducing delay uncertainty u ntil the constraints are met while maintaining electrical and physical rules from the foundry (design rule checks, or DRCs).
For example, this comprehensive design system learns about timing and things that contribute to timing, such as coupling and noise analysis, during routing and uses this knowledge to reduce delay uncertainty.
Another benefit of this unified system approach can be found when considering the ramifications of giving the clock access to up-to-date timing information. With this awareness of timing the tool no longer needs to adhere to a skew specification. It can effect changes to the timing of the design by modifying the clock skew, using nonzero skew to converge on timing.
Clock skew choices
If logic designers know that the clock tools can help to achieve timing closure by using clock skew to expand the timing resolution, then they no longer need to include clock skew in their timing budgets. There could be many opportunities for the front-end design er to take advantage of "useful clock skew." For example, they might be able to use RAM with access times that are greater than the clock cycle time, knowing that clocks can be skewed to meet timing.
In another example, a pipelined data path that has one stage with a known "worst case timing path" no longer needs to be the timing bottleneck of the design. The designer can simply ensure that the stages for driving and receiving data from the long stage is shorter than cycle time. The clock tree implementation will then skew the rest of the clock so that all three stages meet timing. This is essentially a cycle-stealing implementation on registers, with a timing-aware clock tool.
Today, DSM cell capacity, along with interconnect effects, have combined to produce an increase in the complexity of clock trees that exceeds the capabilities of existing point tools.
The ramifications of large designs, where delay cannot be fully understood until the final route is completed, is forcing design teams to devise new applications of the clock skew specification.
The actual clock tree delays are affected by the understanding and inclusion of interconnect effects. Only a timing tool that accounts for DSM interconnect effects can completely realize the actual clock behavior. Further, only a tool that can place, route and analyze the clock tree timing in the context of DSM interconnect effects can make the necessary automatic adjustments to the clock structure to ensure that timing is met.
Related Articles
- SOC: Submicron Issues -> Physics dictates priority: design-for-test
- SOC: Submicron Issues -> Technique probes deep-submicron test
- SOC: Submicron Issues -> SiPs enable new network processors
- SOC: Submicron Issues -> 'Sea of blocks' speeds up SoC designs
- SOC: Submicron Issues -> Deep signal integrity can be assured
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
E-mail This Article | Printer-Friendly Page |