|
||||||||||||||||||||
Connecting reality and simulation: Couple high speed FPGAs with your HDL simulation
Stefan Reichör, Gleichmann Electronics Research
Martina Zeinzinger, Gleichmann Electronics Research Markus Pfaff, Fachhochschule Hagenberg Hagenberg, Austria Abstract : This paper shows a way to connect a FPGA based prototyping environment with a HDL simulator. When the pure cosimulation feature is used, speedups in a range from 2 to 50 are achievable. We show a new technique to run the design in the MHz range for selected time periods. That technique yields higher speedups (> 100). We applied our approach on a leon3 design and got a speedup from 130 compared to the Rtl-VHDL simulation. This factor allows to run a simulation in one minute that took formerly 2 hours. Introduction Today's developments in digital hardware becoming more and more complex. But while on the one hand complexity is going up, development time on the other hand is precious and tends to decrease for the sake of competitiveness. To guarantee the functionality of such complex systems, numerous test cases have to be checked in laborious simulation runs. One way to speed up the simulations is to move some parts of the digital design to a FPGA and to run a cosimulation from a testbench written in a hardware description language (like VHDL, Verilog, SystemC) and the design in the FPGA. A cosimulator working with this principle yields speedups in the range from 2 to 50. The limiting factor for this kind of simulation is the testbench execution in software and the communication time to send data from the simulator to the FPGA respectively to send the response from the FPGA back to the simulator. The speedup depends on the testbench complexity and on the amount of data that is transferred from the testbench to the FPGA and back. This factors differ much from design to design. Therefore the testpattern throughput (in pattern/second) gives a better predictable number. We have compared many different designs and measured a pattern throughput ranging from 100 kHz up to 400 kHz. When the FPGA is used as pure prototyping solution, one can use operating frequencies up to 500 MHz. The prototyping solution allows a faster testpattern throughput than the cosimulation by a factor of about 1000. The following paper presents a way to combine the ease of use of the cosimulation (use a HDL testbench, use the simulator for debugging, ...) and the unmatched performance of the prototyping solution. The starting point is a cosimulation environment. We have extended it to allow the execution of several simulation phases with clock frequencies in the 100 MHz range. In our case the extended cosimulation system is used to speed up the simulation of the calculation of a mandelbrot set and to speed up the simulation of the 32-bit processor leon3. We describe the needed techniques to make the HDL testbench compliant with the prototyping extension to achieve the maximum simulation throughput. Our experiments showed that we can achieve speedups >100 in comparision to the RTL simulations. That dramatic speedups (simulations that take some hours will run in a few minutes now) are a great help for the HDL designer to run simulations in a short time, allowing a better test coverage and faster development times. System Overview The used cosimulation system consists of a simulator and of a coupled hardware device. The hardware device is split in the so called I/O Manager and in the Device Under Test (DUT). Figure 2 shows that three components. The DUT can either be a FPGA that holds the design which should be accelerated or an arbitrary system with a digital interface (e.g. a CPU, a card with a PCI interface, ...). The purpose of the system is that the DUT can be embedded into a running simulation in the simulator. A part of the simulator and the I/O Manager are responsible to incorporate the DUT as simulation model into the simulation. The figure 1 shows the PCI extension card that implements the hardware part of the cosimulation system. The screenshot in figure 3 shows the Mentor Modelsim Simulator that runs a cosimulation. More information about the used cosimulation system can be found at [wph]. Figure 1: The cosimulator Hardware The extended cosimulation system allows two modes of operation:
Figure 2 shows mode 1. In that mode, the simulator can pecify a request for every clock cycle. That request is sent to to the DUT via the I/O Manager. After every clock cycle a response is calculated and sent back to the simulator y the I/O Manager. After every clock cycle the simulator can decide whether the imulation continues with mode 1or with mode 2. Figure 2: Structural view for mode 1: Cosimulation Figure 3: A screenshot of the running VHDL cosimulation Figure 4 shows mode 2. In this phase the simulator sends the number of clock cycles to apply (NumOfClks). When the I/O Manager has received that information, it starts to issue clock cycles. The clock cycles in this mode have a higher frequency than in mode 1 (hence the name Clock Acceleration). Additionally, it is optionally possible to specify a breakpoint configuration. The DUTClk will be stopped, as soon as one of the following conditions are met:
Figure 4 : Structural View for mode 2: Clock Acceleration After the DUTClk is stopped, the I/O Manager sends response data to the simulator. Now the simulator can decide if the simulation continues with mode 1 or with mode 2. Timing behaviour Figure 5 depicts the fact that mode 1 and mode 2 can be alternated as many times as needed during a simulation. Figure 5: Timing view of a cosimulation that exploits Clock Acceleration The available timing behaviour makes it easy to switch from a simulation controlled hardware to an emulated hardware. Normally a clock generation statement like the one in listing 1 is used in VHDL.
To use the “clock acceleration” feature, a foreign procedure called hac_clk is provided. That procedure takes two parameters:
Benchmark & Conclusion We ran several tests with the clock acceleration extension and achieved quite impressive speedups in comparision with the RTL simulation and the cosimulation. The results for a hardware that calculates a 256x256 image for a mandelbrot picture with 10 iterations and the leon3 processor that is used to calculate the prime numbers up to 1000 are shown in table 1.
Table 1: Benchmarks comparing RTL simulation, cosimulation and clock acceleration Our tests proved that it is quite easy to exploit the clock acceleration feature from our cosimulation extension for computation intensive tasks that are mainly clock driven. Fortunately this is true for many designs which include microprocessors. So we see a broad application range for our new technique. The FPGA for the accelerated design is an Altera Stratix EP2S180 device. That FPGA can hold designs up to 1.8 million ASIC gates. We see a strong need for larger designs for e.g. ASIC prototyping. Therefore we are working on an extension board that can hold 4 EP2S180 devices. That system will provide an emulation / cosimulation solution for designs up to 7.2 million ASIC gates. Additionally we have specified a mechanism that allows the cosimulation/emulation with any kind of digital hardware (like ASICs, microctrontrollers, CPUs or even complete boards with a digital interface, e.g. a PCI board). That functionality is a functional enhancement to the normal HDL simulation capabilities, because it allows the cosimulation with designs, where no simulation model is available (e.g. ARM processors). Currently we are working on a cosimulation board that holds a huge FPGA device, a huge memory device and ethernet interfaces. That board will allow the cosimulation/emulation of for example a linux system running on a leon3 processor. The planned system will combine full speed execution capabilities with a simulator coupling for debugging purposes. References [Lip96] J. Lipman. Chip hardware and software: Why can't they just get along? EDN, 1996. [Pfa99] Markus Pfaff. Verfahren zur beschleunigten Systemsimulation mit VHDL durch Integration von externen Hardware/Software-Komponenten. Dissertation, Johannes Kepler Universität Linz, Linz/Austria, Oktober 1999. [Rei04] Stefan Reichör. Entwurf und Implementierung einer HW/SWCosimulationsumgebung mit Schwerpunkt auf der Einbindung von interaktiven User-Interfaces. Dissertation, Johannes Kepler Universität Linz, Linz/Austria, Juli 2004. [Rei05] Stefan Reichör. Simulationsbeschleunigung durch Cosimulation und Hardwarein-the-loop. Mentor Graphics User Conference 2005, 2005. [Row94] J. Rowson. Hardware/Software Cosimulation. Proceedings of the 31st Design Automation Conference, Seiten 439-440, 1994. [RZP05] Stefan Reichör, Martina Zeinzinger und Markus Pfaff. Speed Up the Digital Design Development by Means of Using the Hardware Accelerator and Cosimulator (HAC). FH Science Day, Oberösterreich, Seiten 67-73, 2005. [wph] HAC2 product homepage. http://www.ger-fae.com/HAC_2.html. [Zei04] Martina Zeinzinger. Realisierung eines kostengünstigen stark beschleunigenden Hardware-Cosimulators. Diplomarbeit, Fachhochschul-Diplomstudiengang Hardware/Software Systems Engineering, Hagenberg, Juli 2004. |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |