Co-Design for SOCs -> Cycle-accurate model speeds design
Cycle-accurate model speeds design
By Rob McCammon, Kenny Lee, Michael Rohleder and John Doyle, EE Times
June 15, 1999 (11:59 a.m. EST)
URL: http://www.eetimes.com/story/OEG19990615S0021
A new simulation technology based on the creation of fast software-focused virtual prototypes can be used by both the silicon design and embedded system development teams to meet the requirements of emerging system-on-chip technologies. Basically, the virtual prototype-developed by Software Development Systems Inc. and Motorola Semiconductor-is valuable in evaluating the impact of design changes on system performance while stimulating the design with real application software. It also can be used to better evaluate the available processor choices using a simulator that can execute software, represent the core and peripherals, and provide cycle-accurate timing information. In addition, the prototype can complement the traditional hardware design-simulation-verification tools.
A fast cycle-accurate model of a new processor c ore was used by a partner of Software Development Systems (SDS) in that way. Most of the validation work for the core was completed using a register-transfer level model of the core and a very detailed C++ model of the hardware. A simulator provided by SDS offered an independent third-party representation of the core that was functionally accurate-including timing-from the programmer and user's perspective. Differences in behavior between that model and the other two identified some required changes to the design. The environment permits real application software to be used as stimulus during testing, providing a valuable complement to traditional, hardware-oriented test vectors.
The fast software-focused virtual prototype is distinct from the type of virtual prototype available via the typical EDA-style cosimulation environment. It is important to note that the cycle-accurate core simulator and software debugger used to build the new prototype are also required components of the EDA-style virtual pr ototype. Both styles of virtual prototype will be useful on many design projects.
Essentially, the fast software-focused virtual prototype provides an optimized environment for evaluating changes to system design and developing software. On the other hand, an EDA-style virtual prototype provides much richer HW diagnostic capabilities and is a better tool for verification of an ASIC implementation using real software as stimulus.
One key difference between the two approaches is the way they simulate peripherals. With the fast software-focused one, peripherals are simulated using executable models running under the host operating system (OS). In the EDA-style environment, peripherals are simulated within the context of the hardware simulator and are often described in a hardware description language like Verilog or VHDL. The EDA approach can provide much more detail on hardware operation but at much slower execution. That level of detailed hardware visibility adds little value to most software -development and system-design trade-off tasks.
Here are questions that can be explored using a fast software-focused virtual prototype: What is the exact number of UART devices needed? Is it possible to share one or more of those devices? What is the effect of using a DMA controller for a data transfer instead of software?
Finding the answers to those questions usually requires the execution of all or part of the application code on the proposed configurations of the target system. An EDA-style virtual prototype is unable to execute fast enough to accomplish that type of design evaluation quickly.
Both styles of virtual prototypes can be used to help track down hardware/software integration concerns before the hardware design is committed to silicon. Choosing the best tool will depend on the nature of available simulation models for peripheral devices or ASICs as well as the need for detailed visibility into hardware implementation. Different uses justify the need for different types of solutions.
The accuracy of the virtual prototype determines its utility; the ideal is infinitely fast and 100 percent accurate. The reality is that there is a trade-off between performance and complexity on one hand and accuracy on the other. The goal of the cooperation between SDS and its semiconductor partners is to create an environment that is highly accurate from the perspective of an embedded software developer, without introducing unnecessary complexity or sacrificing performance.
One required component of a software-focused virtual prototype is a fast and cycle-accurate simulation of the processor core. Cycle accuracy is a more stringent requirement than timing accuracy. A basic simulator that is 100 percent timing-accurate provides completely accurate information on the time required to execute a sequence of instructions. But a simulator that is 100 percent cycle-accurate can model the exact clock cycle in which any memory access occurs.
However, the creation of a cycle- accurate simulation is significantly more difficult than the creation of a simple instruction-set simulator. It must model the target processor in much more detail, for example, accounting for the behavior of the multiple execution units and the pipeline. Each instruction in the processor can go through fetch, dispatch, execution, write back and retire pipeline stages. At any time there can be multiple instructions in the system because of the pipelined architecture. One instruction can be running in the floating-point unit while another instruction is being fetched and yet another instruction is being dispatched into the load/store units (LSU).
To maintain cycle accuracy, the simulator must model the complex behavior of the pipeline in much more detail than a simple-instruction-set simulator. This not only adds to the complexity of the system being modeled but also increases the complexity of the simulator structure.
A basic simulator does not model pipeline stages. When the instruction is i n process, it is guaranteed that all source registers are up to date and no one else is trying to access the target register. To achieve cycle accuracy, the model must account for the fact that instructions may complete out of order as a result of pipelining. That greatly affects the complexity of the simulator.
Before an instruction can be dispatched, certain restrictions must be cleared. For example, is there an instruction in the prefetch queue to be dispatched? Is the history buffer full? Is the execution unit currently being used? Are all registers ready? Is it a serialized instruction? Although such question do not have to be addressed by a basic simulator, they are crucial in creating a cycle-accurate simulator.
The cycle-accurate simulator developed by SDS and Motorola deals with that dependency, using a technique called "scoreboarding." This involves identifying the register used as a target in the currently dispatched instruction and marking that register as "dirty." When this inst ruction completes its execution, the register is reset to "clean." Instructions using dirty registers will not be dispatched.
Scoreboarding is an example of mirroring the behavior of the silicon without having to simulate the details of the hardware implementation. This is critical to providing not only cycle accuracy but acceptable simulator performance.
Meanwhile, register dependency is not the only challenge that out-of-order execution brings to the cycle-accurate simulator design. Exception recovery is also a challenge. When an exception occurs the processor must recover to a known state. In a pipelined microprocessor, instructions after the excepting instruction may already have completed. These instructions have to be unwound to restore the processor's previous state. That means the original value of the target register must be saved. To do that, one could save the old register value in the instruction info block, which would add a field to a frequently used simulator data structure. P>
In our case, a careful study of the processor architecture allowed us to identify a more efficient implementation on a recent simulator implementation, the LSUs presented more challenges than any other execution unit-dispatching them was extremely complicated. For example, when an exception occurs a partial value may need to be restored. Imagine an operation to read eight words from internal memory where an exception occurs when the fifth word is being read.
The correct result is that the first through fourth target registers contain new values from memory while the fifth through eighth contain old values from before the operation. This could not be accurately modeled in a basic simulator, which would simulate the entire operation in one function call.
Another solution would be to use temporary cells to store intermediate values and only write back to the registers at the appropriate time. This works fine with a simple delay memory model, but once peripheral simulation has been added to the environment the designer must be able to accurately model each cycle in order to correctly simulate the interaction between the core and a peripheral device.
To provide accurate simulation we designed a more sophisticated system that allows partial execution. The implementation accurately supports all memory access variations and does not require memory accesses to maintain a constant delay.
Essentially, a useful virtual prototype must make it possible to simulate not only the processor core but the memory system, bus topology and peripheral devices as well.
The memory modeling kit (MMK) allows the core simulator to access libraries that provide simulation of the memory system, bus topology and peripheral devices. It uses proprietary technology to maintain cycle accuracy between the simulation of the processor core and the peripheral models. It is essential to the fast software-focused virtual prototype and its ability to easily simulate variations of the processor architect ure.
The creation of such a prototype also requires accurate executable models of peripheral devices for use with the core simulator and MMK. The process for delivering accurate models will vary if some of the functional blocks in the virtual prototype represent functionality that has been specified but not implemented. For functional blocks that have been implemented it is common for the semiconductor developer to create exe- cutable C models of the implementation and validate them against the hardware design.
In this case the implementation, in the form of VHDL or Verilog code, is exercised in a VHDL/Verilog simulator and the results compared to applying the same test vectors to an executable C model that has been created based on the specification. This process can be used as part of the verification process for either processor cores or peripherals.
SDS and Motorola engineers worked closely to integrate their verified C models with a cycle-accurate core model developed by SDS usi ng the MMK. Essentially, it allows the components of the virtual prototype to behave like the real silicon, limited only by the level of detail provided by the model and the simulator.
Special debug interfaces in the peripheral models permit the inspection and modification of the internal status of the hardware devices belonging to the prototype system. Additional capabilities have been implemented to give the system or software engineer the ability to modify configuration parameters in order to identify the most appropriate hardware for his application. A transmission peripheral, such as a UART that can be configured with multiple queues, provides a good example of this capability.
The peripheral model allows the user to specify the number of queues and their sizes before running the simulation. Debugging commands allow the user to view the status of the queues including the number of bytes, the contents of the queue and the status of control signals like "queue full." By varying the periph eral configuration used in the virtual prototype and viewing the results, the designer can select an optimal queue configuration based on the price and performance needs of the application.
Embedded-system developers are constantly increasing the SW content of their products. Market research has shown that all phases of the development lifecycle are shrinking except the HW/SW integration phase. Market research has also shown that regardless of title-HW engineer, SW engineer, system engineer-development team members spend most of their debug time doing software debugging. Teams need powerful software debugging capabilities to meet their development goals.
The SingleStep development environment, cycle-accurate SDS simulator and Memory Modeling Kit provide a platform for a semiconductor supplier to use in delivering the IP needed by its customers to evaluate silicon and begin development prior to silicon availability. Basically, this environment has been developed to focus on the needs of the so ftware developer while still providing the ability to integrate cycle-accurate hardware models, making it an ideal instrument for the evaluation of processor performance and the development of embedded software.
The ability to create a fast, accurate and stable development platform based on virtual prototyping allows software-related tasks to begin before hardware is available. Cycle- accurate simulation and peripheral modeling allow this environment to be used not only for algorithm and application development but for device driver development and HW/SW integration as well.
The SingleStep development environment, SDS simulator and Memory Modeling Kit have been used by Motorola to create a virtual prototype of a distributed airbag control system. This system would allow a car manufacturer to more easily develop a variety of configurations based on common components including a control unit, sensors and airbags. To demonstrate the design concept, an airbag system consisting of a control unit, two sensors and eight airbag deployment units was implemented as a virtual prototype. The control unit is based on an integrated microprocessor, which includes a core and several peripheral devices. The peripheral devices are used heavily for communicating with the external sensors and deployment units. A simple bus is used to provide communication between the control unit, the deployment units and the sensors. One bus is used for the two sensors and four airbags in the front of the vehicle. A second bus is used for the four airbags in the rear.
Any airbag system must be extensively validated to behave according to specification. This simulation model was developed to enable verification and validation of the complete airbag system during development.
Demonstrable reliability
The availability of a simulation model for the airbag system ensures a clear and common understanding of the functionality of the control unit processor and associated software between the silicon provide r and the customer before any silicon is produced. Compared to previous communication methods, the ability to demonstrate the system in operation was significantly easier and more reliable.
The incorporation of an easy-to-use configuration capability allows customers to investigate different system configurations and select the one most appropriate for their application.
The virtual prototype allows the embedded-development team to start work on their application code for the airbag control system in parallel to the development and manufacturing of the silicon components needed to build the system. They start early and work in a full-featured software-development environment without being exposed to, or concerned with, details that are typically of no interest to them.
The virtual prototype provides the capability to gain a deeper and more thorough analysis of the problem space. The additional debug capabilities and the common basis for the development of both the hardware and the sim ulation model are expected to lead to higher product quality and increased first-pass success for both the silicon components and the embedded system.
From the perspective of the silicon vendor, the availability of the proper simulation model is of value in demonstrating capabilities to prospective customers and offering an advantage to customers in the form of a pre-silicon development environment.
Rob McCammon, Simulation Product Manager, Software Development, Systems Inc., Kenny Lee, Development Engineer, Lombard, Ill., Michael Rohleder, Software Technologist, Motorola Semiconductor Products, Munich, Germany, John Doyle, Safety Systems, Engineering, East Kilbride, Scotland, rob_mccammon@sdsi.com
Related Articles
- Co-Design for SOCs -> Software debug shifts to forefront of design cycle
- Co-Design for SOCs -> 'Ramping-up' for complex multiprocessing applications
- Co-Design for SOCs -> Designers face chip-level squeeze
- Co-Design for SOCs -> Blend of tools needed for verification
- Co-Design for SOCs -> Embedded SOC takes new codesign tricks
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
E-mail This Article | Printer-Friendly Page |