Systems on chip (SoC) and processor design teams are challenged to meet aggressive power, performance and area requirements. As chip complexity grows, teams must verify thousands of lines of code to achieve design confidence. To be successful, teams must reach system goals and verification criteria within tight market windows. Design confidence increases as a function of the number of unique test sequences and completeness of system-level testing. As RTL modifications subside, functional verification investments compound, yielding a "tested" module. At this point, simulation test suites take days or weeks to complete and are most sensitive to design change. Assurance in functional verification takes billions of simulation vectors to achieve. It is no wonder that designers become resistant to change as deadlines near. But what happens when initial place and route results indicate the design does not meet system timing or power goals? Upon evaluation, the designer may conclude that an overhaul to the datapath, and perhaps control logic is required! This may be a simple change or a major rewrite. Fear sets in with the realization of how much work is required to re-verify the modified design and that the changes will depreciate all previous functional verification investment. Under these circumstances most engineers go for the least invasive change possible. However, this can create problems: fixes that are simple in a global context become complex in a local implementation or fixes that minimize risk may not completely meet the system criteria and require multiple iterations of change. Any change to the RTL, no matter how small, has the potential to introduce subtle side effects. Making sequential changes In most cases designers meet their timing, power and area goals by making micro-architectural transformations to RTL code. Micro-architectural transformations are sequential modifications. As RTL models undergo optimization, designers perform what-if analysis on key design features such as the size of caches, number of stages in pipelines and power management. In SoC and processor design, power, timing and area characteristics are difficult to estimate and trade off. Alternative micro-architectures have vastly different datapath and control logic implementation that require significant effort develop and considerable additional effort to verify. Simulation can be used to verify micro-architectural optimizations. However, simulation test-suites run for weeks, and sequential changes can invalidate existing testbenches. The testbenches are compromised by differences in design latency, throughput or I/O signals. When this happens, testbenches require manual inspection and adjustment. In many cases the testbench is more complicated than the design, making testbench changes susceptible to their own bugs. Depending on the situation, testbench failures can ripple from block level regressions into sub-system or even system-level tests. Design teams can localize the impact of sequential changes by writing testbenches with transaction or handshake interfaces. Sequential equivalence checking gives designers another functional verification option. By proving equivalence between designs with sequential differences, all previous functional verification investment is leveraged across future RTL implementations. Sequential equivalence checking ensures functional correctness of micro-architectural transformations, giving designers the confidence to make RTL changes late in the design process. This technology also offers quick detection of side effects and ensures RTL optimizations remain consistent with the original functional intent. An example of a sequential change A logic design, module or full chip can be thought of as a machine consisting of sequential state (or memory) and combinational logic. Combinational logic is responsible for computing the next machine state as a function of inputs and current machine state at any point in time. The machine outputs are a function of current state and/or inputs at that point in time depending on whether the logic is constructed as a Moore or Mealy machine. The design described in figure 1 takes four input values. It outputs the average of the four values and absolute difference between input value 3 and the average. It is fully described by the RTL code, schematic and state timing diagram. The Average4 design representation in figure 1 has one level of state, namely the registers on the output. Figure 1 — Averaged design Changing the design to read the input values serially modifies the state and timing. This serial transformation is an example of a simple sequential modification. Figure 2 shows a serial input implementation of the Average4 design. Notice the added registers to store the inputs, finite state machine (FSM) and corresponding latency in the output. Figure 2 — Serial implementation of Average4 circuit In our example, the Average4 design in figure 2 is functionality equivalent to the Average4 design figure 1. A pair of designs is judged to be functionally equivalent if starting from a consistent state, over time both designs produce the same output given the same input sequence. In this example, simple inspection shows that sample output in cycle N of figure 1 will match output in cycle (N*4)+1 of figure 2. This example demonstrates that functionally equivalent designs may be substantially different in terms of combinational logic and the manner in which they store and access state. Sequential transformations to improve power As chips become more complex, the traditional trade-off between performance and area is compounded by the addition of power. Influence on dynamic power decreases as designs move from system-level to gate-level. Consequently, design changes that improve dynamic power are typically nontrivial sequential changes, for example clock gating and operator-sharing optimizations. As the scope of change increases, so does the likelihood of introducing subtle side effects. Consequently, significant modifications like sequential changes are avoided late in the design process even if they give the greatest savings. This avoidance creates a serious limitation in converging towards power-optimal designs because accurate power estimation requires capacitance information from place and route. The next example demonstrates a sequential change in the Average4 control logic to improve power. Power is reduced by employing resource sharing and modifying the state machine encoding to iterate around a single adder and register. Figure 3 shows a power optimized version of the Average4 design. Figure 3 — Power optimized Average4 design Power management begins at the system-level. System-level models run sufficiently fast to enable booting an operating system and running application traces. This gives accurate estimation of switching activity. System-level models are then refined into RTL implementation. RTL models have the same functionality and can take into account physical implementation details like capacitance to further analyze dynamic power. The existence of a functionally accurate system-level model allows for sequential equivalence checking between system-level descriptions and RTL implementations. Sequential transformations to improve timing Timing problems and long paths are common RTL design issues. While changes that improve timing may be simple, the cost of verification late in the design process is sizeable. Consider one of the least invasive RTL changes to improve timing, datapath logic re-timing around pipeline flip-flops. This sequential change solves marginal timing issues and is unlikely to affect simulation testbenches. However, traditional combinational equivalence checkers cannot handle re-timed logic because the flip-flops are no longer considered equivalent design points. In figure 4, we have re-timed combinational logic around the state elements in the Average4 design. For simplification purposes, the cycle length in the timing waveform is shortened and the detailed register placement between adders, subtractors, comparators and mux logic is omitted from the diagram. The net result from these changes is increased overall system performance with the same functionality as the original Average4 design. Figure 4 — Re-timed Average4 design When simple re-timing will not achieve timing closure, more ambitious sequential changes are required. In such cases, it may be necessary to increase pipeline stages. The nature of re-timing RTL control logic poses some of the toughest verification problems. Corner case bugs are often introduced. For example, it is anecdotally known that sequential re-timing of control logic that manages instruction level parallelism and multithreading in CPU design has resulted in live-lock and other errors which are not only difficult to debug, but also hard to simulate. A simulation based verification methodology would have to account for latency and throughput differences as well as be augmented with directed and random tests to account for sequential changes to control logic. To avoid re-validating test suites to account for extra delay in output, testbenches should be written with latency programmable or insensitive interfaces. Designers must plan ahead for verification of sequential changes, or the cost of this increasingly common design change will have negative effects on project schedule and cost. Sequential equivalence checking to the rescue One way for designers to quickly manage verification of sequential changes is by using a sequential equivalency checker. Sequential equivalency checkers verify that new RTL implementations are functionally equivalent to a previously verified reference design — regardless of changes in state, interfaces, or sequential microarchitectural differences. A sequential equivalency checker can detect design differences, if they exist, in minutes, giving designers immediate feedback on RTL changes. Instead of relying on test benches or properties, sequential equivalence checking uses a golden RTL model or system-level reference design written in Verilog, VHDL, SystemC or C/C++. Unlike combinational equivalence checkers, sequential checkers prove functional equivalence across levels of sequential and data abstraction. Sequential changes like re-timing and resource sharing require only minor changes to setup parameters to reflect new timing information. Conclusion Micro-architectural optimizations for power, timing and area often result in sequential changes to RTL datapath and control logic. Sequential changes have a sizable impact on functional verification. Not only do they require careful re-verification of the entire design, but they can also invalidate simulation testbench results. Design teams find themselves pulled between achieving system-level requirements, minimizing design change and functional verification completeness. Sequential changes late in the design process are becoming more common. Designers should improve their functional verification by encapsulating testbench interfaces and utilizing sequential equivalence checking. Sequential equivalence checking verifies designs without using testbenches, quickly identifies design side effects and natively handles designs with sequential differences. In the end, being able to confidently make and verify these sequential changes dramatically improves designer productivity when making system optimizations, and enables design teams to meet their stringent power, timing and area goals. Venkat Krishnaswamy is Director of Engineering and cofounder of Calypto Design Systems, a provider of sequential equivalence checking tools. He currently leads a group responsible for ESL design and verification methodology development. Previously, he held design for test and architectural modeling positions at Sun Microsystems and Afara Websystems working on the Niagara CPU. Duncan Mackay is a Methodology Consultant for Calypto Design Systems. Mackay has more than 18 years experience in IC design and EDA tool development, with a specific focus on behavioral, power and logic synthesis. Prior to joining Calypto, he worked in development engineering at Synopsys and on multiple ASIC design projects with Sharp Microelectronics. Mitch Dale is Director of Product Marketing for Calypto. He has more than 17 years of experience leading IC verification marketing and engineering efforts. Prior to joining Calypto, Mitch was the Director of Verification Solutions responsible for application development of Emulation and Acceleration products within Mentor Graphics. Prior to Mentor, Mitch was at Ikos Systems for 14 years. |