|
|||
Clock Path Pessimism: Statistical vs. LogicalBy Syed Shakir Iqbal, LG Soft India Pvt. Ltd. 1. Introduction Clock path has always been one of the most critical as well as complex components of timing analysis in synchronous design. With increasing complexities in both functionality as well as test architecture, designers now struggle with a large the number of clocks as well as the controlling logic making clock path analysis much more complex and difficult. Furthermore, even with the remarkable pace of development and evolution of EDA tools the defects as well as issues associated with clock path design only continue to increase as technology and architecture continue to evolve. The importance of clock path thus is not only limited to static timing analysis, but also plays a key role in system architecture and silicon behavior. This paper will briefly discuss about the methods and scenarios involved in clock path analysis driving a robust and high yield SoC implementation. 2. Clock Path Pessimism : Designer vs. Tool To ensure proper design functionality as well as a good yield in post silicon samples designers as well as tools often introduce additional margins in the form of pessimism during clock path analysis. This pessimism may be grouped into two main categories: Statistical and logical.
Figure 1: Different perspectives of adding clock path pessimism to design. Statistical clock path pessimism as the name suggests is driven from the statics of clock path analysis. Such type of pessimism models the inter-chip and intra-chip variations and can be used by either the intention of the designer or by the behavior of the tool itself. These variations are mostly implementation technology and target margin driven (eg. speed binning). On-chip variation derates for late-early analysis, timing clock path uncertainties, CPPR threshold and library cell models (in terms of voltage, VT, 3-signma, 6-signa etc.) are some common examples of statistical clock path pessimism used commonly in synchronous design. Statistical clock path pessimisms are generally translated into yield improvements. Logical clock path pessimism on the other hand models the clock path behavior and functional robustness checks (clock switching, functional glitch etc.) driven from the design architecture itself. The causes for logical clock path pessimism are generally driven from timing constraint modelling and functional clock path designs. While the causes for logical clock path pessimism are mostly technology independent, the impact of technology driven or statistical pessimism can further add upon the logical one making it effect dependent upon both design as well as technology. Logical clock path pessimisms are generally translated into greater clock path coverage and hence lesser functional bugs. In most designs both statistical as well as logical clock path pessimisms exist simultaneously as yield and coverage are two different cost functions for successful SoC design. This concurrent method of adding pessimism through margins and worst case scenario statistically ensures better design robustness in terms of clock path which covers almost all the practical scenarios even under worst case of silicon variability. However, the toll paid by the design in terms of power, area and timing may often result in an unreasonable design QoR. Thus, it becomes an analytical and engineering bandwidth driven challenge to apply these two forms of clock path pessimism techniques adequately to model the design for realistic robustness and QoR. 3. Designer Intended Statistical Clock Path Pessimism Let us consider an example in figure 2 to understand the designer based clock path pessimism introduction. In the given figure the setup and hold timing path are analyzed from Q1 to D2 with the clock CLK operating at 2GHz (period T = 500ps).
Figure 2: Sample timing path. For timing analysis circuit elements C1->A1->REG1->X1 form the launch path (assuming nets to be ideal and zero delay) while elements C1->B1->B2->B3 form the capture elements. Thus the ideal setup and hold equations without considering any variations can be written as: Due to manufacturing limitations in the technology same cells show different delays and output transition times at different locations and different instances of time which is termed as on-chip variation or OCV in STA. Usually this variation is modelled through certain path derating factors which reflect the percentage variation by which the delay of the circuit elements can change. Hence by using the two extremes of variations a late or slowest delay and early or fastest delay model for each circuit element is evaluated and depending upon the user the analysis it is used accordingly. One of the most common methods of using late-early variations is the use of OCV timing where the tool minimizes all the timing parameters that improve the timing slack by using early/fast models while for all the elements that degrade the slack the delay is maximized by using the late/slow model. This type of modelling ensures robustness through manufacturing as both the extremes are analyzed simultaneously under the worst case scenario. Now let us get back to the analysis of sample path in figure 2 under OCV late-early timing. During setup timing the target of an STA engineer is to analyze the launch path for the slowest delays while the capture path is checked for the fastest delays. Similarly, for hold timing the target of an STA engineer is to analyze the launch path for the fastest delays while the capture path for the slowest delays. Assuming that the technology benchmark for OCV variations is +/-10%, the user can add a +/-10% user derate on launch and capture paths to model the OCV variations of silicon. So the launch path gets 10% faster for hold and 10% slower for setup while the converse happens with the capture path. The OCV variation compliant setup and hold slack can thus be written as: With the use of OCV derates the setup is now violating by 75ps (15% hit on clock frequency with period 500ps) while the hold margin has decreased by additional 110ps. Now let us further add an additional +/-15% derate on the clock path cells to model the designer intended clock path variation pessimism as clock path robustness has a much higher priority than data path. The equations with additional OCV designer margins in clock path can be written as follows: The addition of incremental +-15% derates in the clock path by the designer has thus degraded the design frequency by ~33% while having almost no margin on hold. Hence, although the designer has ensured that the analysis covers the silicon variations and margins decently but in doing so he has compromised the performance by 33% which will come at the cost of either power or area. Also if the derates are further increased then the hold which was earlier met with ample margins will now start violating , implying that either the clock skew be adjusted or the data path introduces an additional buffer to meet the hold timing. Hence, by using clock path specific margin/derate based pessimism a designer struggles to balance the robustness with QoR degradation and therefore a need for proper clock path analysis for margining is required. It should be noted that in this case we calculated slacks with the variation in delays without applying any CPPR or common path pessimism removal in the clock path for the sake of simplicity. The concept of CPPR will be discussed in the next section concerned tool based analysis. 4. Tool Intended Statistical Clock Path Pessimism In the previous section we discussed about how the margins added by the designer as clock path pessimism may result in a compromise between robustness and design QoR. Now consider another perspective where statistical clock path pessimism is introduced due to tool intention. In most cases tool intended pessimism which are independent of designer are an artifact of clock path implementation (and design) itself.
Figure 3: Timing path categorization from a tool’s perspective. To understand a tool’s perspective in STA, it is important to observe how a timing tool envisions and identifies a timing path. For any synchronous timing path there are four main segments that a designer as well as EDA tool must consider (modelled in figure 3):
A typical EDA timing tool generally introduces clock path pessimism independent of designer by two main methods:
4-1. Maximization of Uncommon Path: Uncommon path maximization allows the tool to consider maximum variation in the launch and capture paths. The theory behind this behavior is driven from the fact that variation has the least impact on path elements that are common for launch and capture. Hence if a buffer comes both in the path of launch and capture then statistically as well as practically its near to impossible that during launch it possess the variation on one extreme (say 10% faster for setup launch) and during capture the variation goes to the exact opposite (say 10% slower for setup capture). Mathematically this statement is only correct for paths being launched and captured at the same edge or commonly known as zero-cycle checks like default hold or zero cycle setup checks, as there is also a minor temporal component involved in variation of cells i.e. a cell will not always have same delay arcs at different instances of time. Figure 4: Tool intended clock path pessimism for maximizing the uncommon path in case of clock re-convergence. In practice this temporal component is generally ignored in most cases except for very high frequency clock paths. However, this doesn’t mean that the tool will cancel the variations on all common elements in the clock path; instead the tool tries to reduce this optimism due to reduced variability by allowing common path considerations only until the first divergence node from the root/source of clock path. After this node even if the clock re-converges, the tool will consider the rest of the path to be uncommon and now acceptable for variation related derates. Figure 4 shows how the tool maximizes the uncommon clock path component in spite of multiple clock re-convergences. 4-2. Minimize CPPR: Uncommon path maximization leads towards the reduction of valid common path. However, the clock path pessimism reduced by using this common path is controlled through minimization of CPPR which does not always remove the complete pessimism. Suppose we had applied a clock path derate of +/-10% on the buffer C2 in figure 2. This means during setup check for fastest launch the delay of common element is 0.9*100ps = 90ps and slowest capture the delay changes to 1.1*100ps = 110ps. The skew across launch and capture comes out to be 110-90 = 20ps. Hence ideally, the tool should adjust this as a CPPR of 20ps in timing calculations but this does not happens. In most cases, the tool considers a default threshold of minimum uncommon and/or common path delay to be considered for each path. This is done to model minimum variation uncertainty impact on paths which are completely common for both launch and capture, like scan chains. This means that either if the total skew due to remaining uncommon path or the total CPPR adjusted through the path falls below a certain defined threshold value, then the tool transfers some elements from the common to uncommon group and adjust the CPPR accordingly. Consider an example in figure 5. From initial inspection we can see that for ideal timing the buffer chain C1-C5 forms the common path while the launch and capture path specific buffers A1 and B1 are the uncommon path elements respectively. Now suppose the OCV derates applied on the delay variations of each of these cells is say +/-10% for both late and early paths. This implies that during setup the path delay from C1->A1 will be increased or made late by 10% while for C1->B1 it will be made early by 10%. So the skew due to uncommon path A1 and B1 is (1.1*100ps – 0.9*100ps) = 20ps under ideal analysis while the common path pessimism or CPPR is (1.1*500ps -0.9*500ps) = 100ps. But what if the designer/tool has introduced a minimum CPPR threshold of 20ps to introduce 20ps pessimism during CPPR calculation? Figure 5: Trade-off between CPPR valid calculation and minimum uncommon skew threshold requirement. The tool can model the CPPR threshold through multiple methods some of which are as follows:
Suppose the tool uses method 1. In order to meet this minimum threshold for CPPR the tool performs and adjusts both the common and uncommon paths components until the threshold criteria is met. From the table shown in figure 5 it can be seen that if the tool simply removes C5 from common path to uncommon the uncommon path skew increases to 40ps but the CPPR reduces by 20ps and hence is now covered by the minimum threshold. Similarly if the designer further intends to increase the CPPR threshold to say 50ps then the tool will simply remove C3, C4 and C5 from the common path bucket and reduce the new CPPR to 30ps. Hence adjusting the difference of base CPPR (with al common elements) to the recalculated CPPR to be greater than or equal to the CPPR threshold (100-30 = 70 >= 50) the tool can adjust the CPPR branch point accordingly. Similarly, in method 2 the tool relaxes the CPPR criteria and allows the adjustment of the CPPR branch point to be either equal to or less than the CPPR threshold. So comparing methods 1 and 2 the relocation for a threshold of 20ps comes out to be C4 for both the methods while for 50ps it comes out to be C2 in method 1 and still C4 in method 2 ( 0 <Added uncommon C5 <= 50ps). The CPPR calculation and analysis for the later methods 3 and 4 is simpler as these do not require any adjustments of common and uncommon elements but rather simple threshold checks and subtraction respectively. Table I shows how the CPPR calculation is adjusted through these four methods in case of circuit in figure 5 respectively. TABLE I: CPPR Adjustment Through Multiple Methods Based on CPPR Threshold. It should be noted here that if the same common path feeds multiple paths with different variations of uncommon path then in most cases the tool opts for the worst case CPPR point i.e. minimum CPPR adjustment. Same happens with delay noise component in the common path. For both launch and capture the tool only considers the worst case magnitude of the delay noise and adjusts its impact (make the delay more late or early) accordingly. This additional pessimism saves the tool significant run time during CPPR calculation. To remove this worst case CPPR adjust the tool may allow the designers to use path based CPPR calculation and select the elimination of noise from the common path elements in case of zero cycle checks like hold (as at the same time a different noise is not possible at the same nodes) by compromising run times and clock path margins. 5. Logical Clock Path Pessimism In the previous sections we had discussed about clock path pessimism being controlled and modelled through means of technology driven factors like OCV margins and minimum CPPR thresholds. Both of these however are based on mathematical analysis and not completely design dependent. There other clock path pessimisms as well that are added by the designer which are based on analysis of worst case logical scenarios. Some of the most commonly observable examples of such type of pessimism are:
5-1. Clock Re-convergence Pessimism (CRP) & CRP Removal (CRPR): CRP refers to a clock path phenomenon where the clock first diverges followed by the convergence at some another point. The example shown in figure 6 depicts a typical scenario of CRP involving divergent exclusive paths through CG1 and CG2 respectively. This type of architecture as well as constraint definition is considered a bad practice from clock path design and generally designers avoid such type of operations that involve clock re-convergence in a valid timing path. CRP arises primarily due to some missing constraints for specific clock path selection or mode merging for two or more divergent clock paths. Additionally, there can be certain paths (DFT/Test modes in particular) which do use re-convergent clock paths simultaneously and hence is these cases CRP turns out to be a major issue. In most cases, the designer club the paths that do have a valid CRP into a separate timing mode while they use CRP removal or CRPR constraining techniques to avoid it in modes that do not involve clock re-convergence. Let us now analyze the example in figure 6 to understand how CRP impacts the timing. The data path comprises of all the green colored cells which include the 2 register REG1 & REG2 and the buffer cell X1 connecting them. The common clock path point as seen from the figure is at the output of buffer C1 i.e. the clocks of registers REG1 and REG2 have no common element beyond this point, so all the clock cells after C1 are uncommon paths and hence we define path chains A1 and B1-B3 buffers as the uncommon clock path for launch and capture respectively. At a given time since only one input of the multiplexer MX can be active, hence clock gate CG1->MX0 and clock gate CG2->MX1 both form two exclusive clock paths.
Figure 6: Sample timing path segmented into different path categories. The path shown in figure 6 has two possible paths launch paths for the clock CLK to reach the register REG1 and two possible paths capture paths for the clock CLK to reach the register REG2 along with a single data path through X1:
Based on how the designer has defined the timing constraints in STA multiple scenarios for timing exist and the intent of the tool will always be to select the most pessimistic scenario amongst these. If the designer has not constrained this circuit for any clock re-convergence control, then the tool is forced to consider all the possible launch-data-capture scenarios and select the worst possible one amongst them. So there are 4 possibilities of clock path combinations (figure 7): Figure 7: Possible launch and capture paths for tool analysis. (Length of arcs represents the delay involved) Amongst these 4 possibilities the B and C are the ones that involve CRP since they use logically different clock paths during launch and capture. It can be observed from table II that both these paths have exactly zero CPPR (CPPR threshold considered to be zero) as they have an uncommon path from the root itself which is the point of divergence and hence are most timing critical, one for hold and the other for setup. TABLE II: Slack Calculation for Various CRP Based Timing Paths in Figure 6. From the above example it can be inferred that, as clock re-convergence adds multiple independent paths for clock propagation it results in reduction of the common path and hence clock path pessimism is introduced. This makes the timing paths for launch and capture uncommon from very beginning of divergence and hence the calculated value of CPPR is significantly reduced. CRPR vs. CPPR: CRP and CRP removal (CRPR) are often used synonymously with CPPR; however, CRPR and CPPR should not be confused to be one and the same but rather two completely different perspective of clock path pessimism. CPPR is primarily due to OCV variations while CRPR is an architectural artifact. If a designer sets the OCV derates as 0%, then ideally since neither launch nor capture path is being considered for worst and best variations; there will be no CPPR involved at all as the common elements will have the same delay for launch and capture and thus zero skew. However, even under this scenario CRP will be applicable as it eliminates the common path itself and hence there will always be a skew involved in clock re-convergent paths until unless they are balanced with precision which is very unlikely for multi-mode-multi-corner analysis. Figure 8 depicts the differentiation between clock path elements contributing CRPR and CPPR respectively.
Figure 8: Distinction between clock path elements contributing towards CPPR and CRPR. 5-2. Clock Exclusivity: Due to evolving complexity of STA coverage requirements as well as robust clocking architecture the number of clock sources as well as clock modes of operation have increased significantly. This has lead towards a drastic increase in multiple clock definitions to model mode merging. A common example will be a mode that covers two different frequency of operation for the same system clock say one with 500MHz and the other with 200MHz. This in addition to CRP and clock tree balancing issues also leads towards a highly unpredictable pessimism in the design which ambushes the designers at the very last stage of design closure; the noise analysis. A common argument will be why not time the design at the highest frequency only i.e. 500MHz as hold timing is frequency independent and a setup timing which is closed at 500MHz should always work at 200MHz. However, this orthodox statement is no longer true when noise and high frequencies are involved. Noise analysis is primarily driven from the timing window calculations based on arrival times and the attacker impact on the nets. Arrival time is affected by the absolute value of timing period and hence it may be possible that for the period of 200MHz there may be a noise in the hold path while for 500MHz it may come out to be clean. Hence to ensure proper hold timing with noise multiple clock frequencies for the same clock have become much more practical in STA constraint definitions and are termed as exclusive clocks i.e. only one of them can exist at a time. Although the multiple clock definitions add coverage for the blind spots in hold timing, they also add unintentional pessimism. It may be possible that the 500MHz clock based timing window adds noise at point A while the one involving 200MHz adds the noise in the path following the next cell to A ;say B. Additionally, since clock nets have the sharpest average slew and are the much more concentrated, their impact as attackers is much more significant and as we increase the number of clock definitions in the design, the number attackers are almost multiplied and the noise degradation leads towards pessimism in both clock as well as data path. Thus by clubbing two clocks together we have added a double pessimism both at A as well as B when only one of them is possible at a time. This type of clock path pessimism thus has arisen not due to the physical or logical nature of the clock path but simply due to pessimistically interpreted clock presence by the tool. In order to control this pessimism EDA tools provide a constraint feature to model such exclusive clocks both in terms physical as well as logical nature through a clock grouping attribute which specifies which clocks are physically exclusive (one present at a time) and which are logically exclusive (act as attackers simultaneously). 6. Conclusion In the previous sections we came across multiple timing path examples and scenarios where clock path pessimism was analyzed and evaluated through multiple methods. We observed how statistical and architectural (logical) parameters affect clock path pessimism and how their concurrent application can result in QoR degradation. Additionally, we also discussed important clock path timing related factors like CRPR and CPPR and learnt about their differences and behavior under various scenarios. While ensuring the robustness of a design with the latest functionality and technology both variations and clocking related bugs continue to increase and the boundaries for making tradeoff between desired robustness and QoR continue to move far apart. The cumulative of all these discussions thus concludes to a final inference that clock path pessimism and analysis was and still is the major Achilles’s heel for designers in terms of both yield improvement and robust clock path architecture and hence every designer should be well versed with the clock path labyrinth that keeps surprising us as we continue to evolve towards more complex SoC solutions and technologies. |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |