Strategies for verifying microprocessors

Strategies for verifying microprocessors
By Khizar Khan, EEdesign
May 10, 2002 (12:50 p.m. EST)
URL: http://www.eetimes.com/story/OEG20020502S0070

Today's microprocessors have grown significantly in complexity and functionality. Most of today's processors provide at least three levels of memory hierarchy, are heavily pipelined, and support some sort of cache coherency protocol. Other common features are out-of-order-execution and built-in privileged data isolation.

These features are extremely complex and sophisticated, and present their own set of unique verification challenges. For instance, heavy pipelining and out-of-order execution, implemented with re-order buffers, service stations and instruction IDs, make it very difficult to develop a reference model that is not more elaborate than the design itself and still satisfies verification requirements.

Memory hierarchy and cache coherency are also very difficult to verify because they are implemented using an elaborate set of FIFOs, snooped queues and CAMs. These provide a verification engineer with a never-ending list of corner cases. Added to all of this, the sheer size of the design itself continually tests the limits of the simulators, making verification of these devices more challenging.

So what techniques are deployed to verify microprocessors successfully? To understand this, we must first understand the basic processor verification flow.

The microprocessor verification process
Earlier generations of microprocessors used memory images of program traces as stimuli. These program traces were written manually in the assembly language defined by the architecture.

The testbench consisted of the processor RTL and a behavioral model of the system memory. The system memory was first loaded with the program trace. Then the reset sequence (sometimes referred to as the boot sequence) of the processor commenced. This involved clearing the registers and appropriately initializing the memory subsystem of the processor. After completing the boot sequence, the program jumped to the diagnostic test and executed it. If there were no violations detected by any of the monitors, the test was considered successful.

For more complex microprocessors, which implement elaborate instruction sets, the same approach works. However, since the design is more complex, there are more test cases to cover. Most of these cases are easy to develop but their sheer number is daunting. This calls for automatic test generation using a random diagnostic generator. There are two challenges in random diagnostic generation of microprocessors.

Figure 1 -- High-level view of a microprocessor

First, the generated instruction sequence must not violate the programming rules defined by the architecture. In essence, the instruction stream must be intelligently constrained, so it does not put the processor into an illegal state. The second challenge is how to ascertain if the test run on the design actually passed. Did the processor execute the instructio ns correctly? Did it produce the correct result? These problems are addressed with a golden model approach.

Checking results
The randomly generated stream of instructions is run on a golden model, perhaps an older version or C model of the microprocessor, and then a snapshot of the residual memory image is taken. Next, the same sequence of instructions is executed by the design under test, and the memory image from the simulation run is taken. Finally the residual memory image of the simulation is compared to the image taken from the golden model. Any mismatch is flagged as a failure.

There are many manifestations of this simple technique. One of these manifestations, called shadow modeling, is of particular interest. Here the diagnostic is generated but the results are not necessarily predicted. Instead, the test is run on both the RTL and the golden model in parallel. Every time the RTL completes an instruction, the golden model is signaled to execute the same instruction. Once the golden model finishes executing this instruction, the two models' architectural states are compared. If there is a mismatch, then a failure is flagged. The golden model is run in parallel (actually in lock-step) with the RTL, hence the technique is referred to as "shadow modeling."

Figure 2 -- Shadow modeling

The most important component of shadow modeling is a proven golden model. This model, which may be an older version of the processor or a soft implementation of the architecture like a C program, must be able to execute the diagnostics, not only correctly, but also in a fashion specified in the architecture. This is particularly important in illegal programming cases, where backward compatibility is important.

The model should support features like trap detection and interrupt handling because the generators will be constrained to generating diagnostics that can correctly run on the golden m odel. This means that for testing features not handled by the golden model, diagnostics will have to be written manually, which can be a daunting task. There is a trade-off between the level of detail implemented by the golden model and the number of manually written, or directed, diagnostics. Some teams prefer to invest a lot of time developing a very detailed golden model, some of which are even cycle accurate. Others opt for a simple golden model and spend more time developing directed diagnostics.

Though random generators are very powerful for finding functional bugs, they cannot be used right from the start. The design needs to have reached a certain level of stability before the random generators can be turned loose on the design. To achieve this level of functional stability, a set of basic directed tests are first developed. These diagnostics, often referred to as the basic test suite, are directed diagnostics aimed at verifying the basic functionality of the design.

The verification teams t ypically take a staggered approach to verifying the design. The design is first qualified with the basic test suite. Next, the random generators are turned on, while efforts continue to develop diagnostics for special cases identified in the verification plan. These special cases are corner cases that cannot be covered by random diagnostics. When the diagnostics targeting these special cases have been completed and the random generators have been running for a sufficient amount of time, the design is considered ready for tape-out. To summarize, the basic and special case diagnostics are developed manually. The rest of the test space is covered using random diagnostics. A strong random test generation scheme, therefore, is fundamental to successfully verify a microprocessor.

Coverage
One thing not yet discussed is coverage, dubbed "the unmentionable evil" by microprocessor verification engineers. Microprocessors are some of the most complex of today's designs. Simulating these devices, theref ore, is a compute intensive and slow process. This has been a pivotal factor in the emergence of two major trends. The first is eliminating running coverage analysis at the full-chip level because the simulation overhead created by integrating coverage tools into the design makes the model almost impossible to simulate. But doing away with coverage is not a foregone conclusion. This is where the second trend, unit-level testing, comes into play. With this technique, coverage is run at the unit level. With unit-level simulation, the coverage overhead is acceptable and the amount of information to sift through is manageable.

Unit-level testing of the microprocessor subsystem plays many roles. It pre-qualifies a design for a full-chip release. Additionally, unit-level test benches are ideal for covering cases that are difficult to target using a full-chip model. There is one challenge, however, in this approach -- what kind of stimulus should be used to test at the unit level?

For full-chip test ing, the answer is relatively simple--use assembly-level instruction. But the answer becomes a little hazy for individual units. Most of these units do not understand and cannot decode assemble instructions.

Deciding on the appropriate stimulus is a critical decision. The choice is between raw interface-specific transactions and assembly code. If you decide to generate interface-specific raw stimulus, you will have to develop new generators for each interface. This can be a real challenge, especially for complex designs. If the latter approach of using assembly code is taken, then keep in mind that the translation of assembly via stub model is not easy and runs the risk, depending on the block being verified, of not being able to generate the most interesting test cases.

For blocks dealing with instruction issue logic (branch prediction, instruction grouping and reordering) and execution logic (ALU and execution pipe by-pass logic) it makes sense to provide stimulus in assembly format, because these blocks are capable of decoding instructions. However, for other blocks, like those responsible for maintaining cache coherency, it makes more sense to use raw interface transactions, for example instruction cache and data cache read requests. This decision is ultimately up to the person developing the environment.

With so many features implemented in a microprocessor, how does one decide the level at which each feature is verified? There is no clear-cut answer to this. The ideal goal is to verify each feature at the unit level and system level. This coverage redundancy is not a bad thing, even though most of the features are verified at the full-chip level. The same holds true for catching corner cases, though in reality that is not always possible, and most of the corner cases are covered in the unit-level environments.

Summary
Having discussed the fundamental techniques, trade-offs and challenges in microprocessor verification, the question arises: what challenges does the future hold? The challenge in microprocessor verification is and will continue to be the sheer size and complexity of these devices. The simulation models are already huge, and, according to Moore's law, they will continue to grow. The simulation speed of these models will continue to degrade and there will be more test cases to cover. How will verification teams address this challenge?

Khizar Khan, co-author of the book has 7 years of experience in high-level verification and developing verification infrastructures. At Sun Microsystems, he is contributing to the verification of the next generation microprocessors. He received his bachelors in electrical engineering from the University of Rochester.