|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Interface Timing Challenges and Solutions at Block LevelInterface Timing Challenges and Solutions at Block Level By Manish Kumar Sagarvanshi (Technical Lead), Madhav Shah (Technical Lead)
Abstract Timing closure of a semiconductor chip is the primary concern for any physical design engineer. Interface timing of a block is as critical as the internal timing. In this article, we will address the challenges faced while fixing the interface timing and the solutions to overcome these challenges. We have used Cadence Innovus as our PnR and Synopsys Primetime as our Sign-off timing tool. Keywords: IO timing (Input & Output timing), WNS(Worst Negative Slack), TNS(Total Negative Slack), FEP(Failing End Point), ns(nano-second), ps(pico-second), PT(Prime Time). Introduction Internal - Clk2Clk timing is always a priority for block-owners working on blocks as compared to the interface timing. A chip-level STA person will always put more margins for interface timing at block-level in the form of higher external delay or higher uncertainties than desired during the timing budgeting. Hence, block-owners do not have to put much effort to converge interface timing. When there is a requirement of fixing IO timing at the block level, and the top-level does not want to go on cycles of feedback and fixes, in such cases the top-level STA person will provide specific requirements to meet IO timing at early stage of PnR. Following are the general requirement the STA person provides:
IO Timing Challenges at Block Level There are many challenges in meeting the timing requirements at block-level, let's look at four major challenges:
1- IO Timing Miscorrelation between PnR Tool and Sign-Off Timing Tool: When we take routed design from PnR to Signoff for timing analysis, due to different EDA tool vendors the IO timing miscorrelation may occur. Following are few reasons for tool-miscorrelation:
At PnR, Innovus calculates source insertion delay by taking the mean value of worst corner’s latencies and then apply the same to all the corners. PT does not take source insertion delay from incoming PnR sdc but calculates it separately for each corner. Since PT calculates source insertion delay independently for each corner using StarRC extraction engine, it is more accurate than PnR tool. For example: As shown in Table 1, Innovus will calculate mean source-insertion delay value for virtual_clock_1 in the worst corner (FUNC_m40_SETUP_SS_C_WC in our case), that is 0.379ns and then applies the same to all the setup corners. While in the case of PT, the source-insertion delay value of each corner is different.
Table 1: Source-Insertion delay for Innovus and PT 2- IO Timing Miscorrelation between Block Level and Top-Level There are a couple of reasons, which lead to miscorrelation between block-level and top level IO timing:
Block constraints like false paths, multicycle paths, external delays, etc. if not appropriately provided in the top-level run or vice-versa can result in huge miscorrelation from block to top-level timing. Few of the other reasons for IO timing miscorrelations are
3- Flops Placement near IO ports During timing optimization, tool will place the flops based on its internal and external timing requirements. Often we have more priority to Internal (Reg2Reg) timing, so flops would be placed a bit far away from the IO ports. To meet IO timing requirement, we can pull these flops near to respected IO ports but it may impact our internal timing too. Thus, this becomes complex process if such thing happens. 4- Latency requirements We can have specific latency requirements for IO ports to achieve IO timing at the block level. There are a few challenges we face while achieving these latency targets, like
The meeting skew between these two flops affects target latency. fig 1 fig 2 It is challenging to target lower latency for output ports in a more extended block due to more distance between the flop and port. As we can see in the above figures, clk port to flop distance is around 1000 micron, with this much distance the lower latency is unlikely to achieve. IO Timing Solutions There can be multiple approaches to address IO timing challenges. Let’s discuss a few of them: 1- IO Flop Bound at Placement Stage It is a fundamental and common approach to fix IO timing. In this approach, we need to identify the violated IO ports and make a flop bound nearby to ports. The distance between the port and the flop bound solely depends on the value of IO violation. This approach will help in reducing the number of optimization buffers/inverters in the IO path. Make sure that the bound should not affect internal timing. Innovus commands for creating bound of the cells: CreateInstGroup <grp_name> -<grp_type> <bbox>addInstToInstGroup <grp_name> <inst> 2- Insertion Delay Settings at Clock Stage It is one of the sophisticated approaches to solve IO timing. In this approach, we need to work on the ideal clock insertion delay for violated input and output ports registers. Let's discuss some essential characteristic of a clock network before digging more into how the insertion delay can help in timing. The latency consists of the following components as shown in Image 3:
fig 3 By default, the tool puts all the clock sinks, driven by the same clock, into a common skew group and balances this with global latency target. Thus, clock insertion delay is effectively determined by the longest sink’s insertion delay. To address the IO fixing, we need to either pull or push the violated sinks such that it does not affect the subsequent timing path. We have tried a similar approach in our design with below-mentioned innovus ccopt specific command. It helped us solve IO timing, and we were able to achieve the ideal clock network delay as mentioned in table 2. For pulling clock by 100ps, set_ccopt_property insertion_delay 0.10 -delay_corner $worst_corner -pin ${reg_name}/phi For pushing clock by 150ps, Set_ccopt_property insertion_delay -0.15 -delay_corner $worst_corner -pin {reg_name}/phi While pushing tool try to add and for pulling reduces clock buf/inv in the clock path.
Table 2: Ideal Clock Insertion Delay Experiment
Table 3: Timing-Summary As shown in Table 3, we were able to fix IO timing with this approach, and also made sure that it’s not affecting the internal timing. It is not always necessary that we will get the desired network delay value as per our settings due to various challenges, which we have discussed earlier. In such cases, we have to try with different insertion delay numbers. There is a chance of new internal timing violations because of IO flops’ latency change. For this type of scenario, we can enable useful-skew with the following set of commands at the placement stage. Note- The latency values in these commands will be inverted relative to the set_ccopt_property specification. Therefore, if you want to pull up a pin, the value of CTS insertion_delay should be positive (0.10ns in our case) and set_clock_latency value at placement stage should be negative (-0.10ns in our case). For Input ports: set_clock_latency 0.150 {reg_name}/phi For Output ports: set_clock_latency -0.10 {reg_name}/phi There is one more way to achieve IO latency if direct pull/push does not work.
create_ccopt_skew_group -name Skewgroup1 -constrains ccopt -source clk -shared_sinks [get_ccopt_clock_tree_sinks $reg_name/clk] create_ccopt_skew_group -name skewgroup2 -constraints ccopt -source clk -exclusive_sinks {get_ccopt_clock_tree_sinks $reg_name2/clk} set_ccopt_property target_insertion_delay -skew_group skewgroup2 550 fig 4: Exclusive Skew Group
There are few more approaches that can be used for remaining minor IO timing violations such as:
Conclusion There can be multiple reasons for IO timing violations and therefore you should select the appropriate approach based on the scenario. The solutions provided in this article can help designer to understand interface timing associated challenges and solutions. To simplify IO timing closure not all solutions are applicable to a particular design, but a mix of these solutions can helps the design team to achieve desired results. References:
Authors: Madhav Shah has almost 7 years of experience in ASIC Physical Design domain. He holds an Engineering degress in ECE and has worked on various technology nodes such as 28nm, 12nm, and 7nm in Networking SoC and has successfully taped out ASIC chips from RTL netlist to GDS including Sign off process. Manish Sagarvanshi has more than 5 years of
experience in ASIC Physical Design domain and has more than 1 year
of experience as a Cadence Innovus Application Engineer. If you wish to download a copy of this white paper, click here
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |