Timing Aware Redundant Buffer Removal
Kushagra Khorwal, Harkaran Singh, Mayank Tutwani, Nishant Madan
Freescale Semiconductors India Pvt. Ltd.
1. Introduction
With the common notion in VLSI world that power and area go side by side. When it comes to implementation and tool perspective there seems few factors which contradict the above statement. The well known behavior of removing the extra redundant logic will help in reducing the area and power proportionally. This paper focuses discusses about a part of such redundant logic like the extra buffering and inverter chains which are not required to meet the timing but unnecessarily happens to be in the design.
2. Source of Redundant Buffer or Inverter Chains
Physical Design Implementation is governed by the optimizations at different stages. During these optimization stages tool insert these buffer/inverter chains to fix the DRV violations, HOLD timing or meet the setup timing. Many times EDA tools add these chains which end up being redundant at the very last stages of the design cycle. With the understanding that we need to reduce the redundant buffer chains, our design should something like shown in figure 1. With yellow being the start point and Red being the end point there is a long chain of buffers shown which seems redundant. Major reasons for the buffering are mentioned below :
Figure 1 : General Representation of SoC
2.1. Design Margins
In earlier stages, with tight timing derates / margins and immature STA constraints tool inserts buffer and inverters in order to meet the stringent setup and hold timing. Where as on final stage when these margins are relaxed these buffers might no longer be required from timing perspective.
2.2. IO Channel DRV Fixing
Moving to a well known issue with the EDA tool which inserts buffer loops in channels to meet the DRV requirement which could have been fixed by single buffer chain instead of loop.
Figure 2: IO Channel Redundant Buffering
2.3. Post-Clock Tree Network Hold Buffering
While performing hold optimization in post-Clock Tree Network stage, tool doesn't take into account the accurate delays in picture thus leading to unnecessary hold buffering. The optimization that are taking place post routing is unable to recover those extra hold buffers.
2.4. Router Miscorrelation
The router invoked during the placement and the Clock Tree building stages is an abstract router ( Dummy router ), while the one invoked at the time of actual design routing is the actual router ( Actual Router ). The Dummy Router is used during multiple optimizations at pre-Route stage. It is a bit pessimistic. Hence, many times it sees fake congestion in comparison to the actual router. When Dummy Router sees congestion it inserts buffers in non-congested region. During actual Routing it might happen that fake congestion is resolvable but due to the buffer location the Actual Router also takes the routes through the path laid down by the Dummy Router resulting in lot of redundant buffering
Figure 3
2.5. DRV Fixing when Tools are not able to fix it
Many times due to design constraint / floorplan, EDA tools are not able to fix the DRVs but they inserts buffer to fix the same.
3. Proposed Methodology
The methodology is an improvement over the original conventional flow where a huge number of inverter/buffer chains are left out because of EDA tool limitations. The idea being proposed is to identify all buffer/inverter chains that exist in the design and try to reduce their length in order to gain on both power , area and routing resources available. The novelty lies in using the hold slack available through such chains and then removing the entire chain. This helps in gaining both the area and the power simultaneously.
3.1. FLOW Description:
- Find all single fanout buffer/inverter chains on the basis of user specified length.
- Calculate chain_distance/chain_count ratio and compare it against a standard ratio number , which depends upon the technology.
- Technology standard ( TS ) is a number or a the distance which a particular buffer in a certain tech node can drive easily without causing drv in the design.
- If r < TS and if Th > Thu, then mark the chain for area comparison. If the total chain area exceeds the area of buffers that are to be inserted between the start and the end point, then mark the chain for the final processing.
- The Start point and the endpoint of the chain lie within a bounding box , where x1,y1 are the coordinates of the start point and x2,y2 are the coordinate of the end point. The proposed algorithm now tries to insert the buffers within this bounding box. If the new buffers fall inside this bounding box, then delete the chain and insert new buffers.
3.2. FLOW CHART
Figure 3: Flow chart of proposed methodology
4. Comparison of Conventional and Proposed Flow
With the new methodology being an enhancement over the conventional flow, new proposed methodology takes into account of the extra redundant buffering and helps in reducing the power and die area.
4.1 Advantages of Proposed Methodology
4.1.1 Die Size Improvement
With proposed flow, area is saved by identifying & removing an entire buffer/inverter chain. By removal of such chains, we also gain upon the number of routing resources along with the via count reduction , leading to congestion improvement.
4.1.2 Power Improvement
It also reduces the overall buffer area leading to saving in the total power of the design.
4.2 SoC level Comparison Chart
If you wish to download a copy of this white paper, click here
|
Related Articles
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
E-mail This Article | Printer-Friendly Page |