ESC: Real-time analysis provides transport support for scan-based emulation
ESC: Real-time analysis provides transport support for scan-based emulation
By Debbie Keil, Systems Software Architect, Prithvi Rao, Systems Software Developer, Texas Instruments Inc., Dallas, Texas, EE Times
March 7, 2002 (7:57 p.m. EST)
URL: http://www.eetimes.com/story/OEG20020307S0086
In an embedded system, being able to analyze the proper execution of real-time applications is critical to their development and deployment. This applies to real-time applications ranging from mission critical to multimedia. The ability to perform real-time analysis (RTA) can involve a dedicated hardware and software capability with an end-to-end methodology that supports the transfer of data between the host and the target in a lossless and reliable manner. Specifically, the RTA encompassed by this methodology consists of capturing data from a target application using dedicated hardware, transferring it through various layers of software dedicated to creating a real-time path and making it available to a host application for analysis. Analysis includes determining of whether applications meet the requirements of both timing and logical correctness. Scan-based emulation is a pervasive method that is deployed to debug, develop and a nalyze real-time applications running on DSPs. Basically, the JTAG boundary scan specification permits the connecting of multiple devices in a serial daisy-chained arrangement. Today there is an end-to-end methodology that is predicated on support in hardware and software across several families of DSPs. There is special emulation hardware architected into the DSP core and emulation drivers as well as RTA target and host side software that permit the user to perform RTA. Fundamentally, this methodology involves using a development environment to develop and download a target application to a DSP. The application running on the DSP interfaces with the RTA software to send and receive data. The data is scanned out using JTAG boundary scan and is received on the host by the emulation driver that interfaces with the host-side RTA software. The data is then presented to the host client application for analysis. The figures of merit used to determine the success of this methodology are per formance, scalability, ease of use and reliability. An important consideration in providing a methodology for multiprocessor RTA is performance. The performance problem has been partially addressed in hardware by dedicating hardware for scan-based emulation. Data is transferred between target and host using dedicated emulation hardware to improve performance. One complication that arises in a heterogeneous multiprocessor arrangement of DSPs is that of varying scan lengths. Each design of DSP has its own emulation hardware, which results in scan lengths that vary within a family of DSPs and between Instruction Set Architectures (ISAs). The result of this variance is that longer scan chains require greater disassembly time for the scanned data, resulting in lower throughput and lower performance. Data can be streamed between target and host by using peripherals such as DMA and by performing real-time memory write operations.
In a multiprocessor JTAG scan, a special JTAG boundary scan bypass instruction obviates the need to scan any device set to bypass mode. This results in less time to disassemble data being transferred between host and target.
An RTA solution for a multiprocessor environment must be able to identify the processor from which data originates. This introduces the need to mark the data with a processor identifier; the decision then becomes where in the system to do this. If we examine the host, we see that there is a one-to-one correspondence between the emulation software drivers and the processors in the system.
Since there is an emulation software driver for each target in the system, these drivers can stamp the data with a processor identifier. Note that from a performance perspective, it is better to mark the data on the host side as to the target side. If a unique processor identifier were sent down to the target and the data were tagged there, more data would be sent from the target to the host and consume precious bandwidth.
At the processor level, it is possible to allow finer-grain identification of data. Virtual data paths that extend from the target application to the host application are used to segregate data. For target-to-host data transfer the segregation policy is determined by the target application writer, whereas for host-to-target data transfer the segregation policy is determined by the host application writer. In either case, the corresponding application (host or target) must be aware of how the data is segregated according to virtual paths. Therefore, both the target API and the host API must contain methods to identify the virtual path on which the data is flowing. The introduction of virtual data path identification has ramifications on performance because this identifier must be carried with the data.
A key aspect of this methodology is scalability. It is addressed in both hardware and soft ware.
The JTAG specification permits the daisy chaining of hardware. The limits placed on the number of devices that can be daisy chained is based on signal strength limitations as opposed to the JTAG specification.
In software, data is tagged from each target with a unique identifier so that data being transferred between host and target can be identified as to which processor it belongs.
Further, the RTA architecture is software scalable; writing the target application is not dependent upon the number or processors and does not have to be altered if processors are either added or removed from the system configuration. There is no requirement that the target application have any knowledge of the type or number of processors in a scan chain at the time of development.
The emulation drivers and the RTA host software are architected to manage the data from the different processors.
The host application should be able to select from which processor to send or receive dat a. This is accomplished by incorporating this functionality into the host API. This proves to be very favorable with respect to scalability. By allowing the host application to select the processor, the same target application can be replicated without change on multiple processors to exploit parallel computing power.
Ease of use is an important but often difficult figure of metric to sustain. A software debugging environment is provided that permits the user to easily configure the hardware in the system.
A trend in DSP emulation hardware is to support device registers that are mapped at fixed addresses. This permits the source code porting of applications. Further, a trend in more contemporary DSP emulation logic is to replicate the logic on all DSPs, which further simplifies the deployment of RTA tools.
At setup, the user selects the type of target and loads the system with an emulation driver for that target. The user also specifies the number of targets of each type and their po sition in the scan chain. Without this capability users would have to add code in their host applications that performed the same function, resulting in messy and unnecessarily complex code.
The debugging support software permits the setting of devices on a scan chain to be bypassed. In the absence of this support, the application might have to disassemble unwanted scans.
Host side support is provided in the way of object-oriented interfaces based on the Component Object Model (COM)4, which is a de facto industry standard. This permits the host application developer to write client programs that are not tightly coupled to a specific DSP.
The JTAG specification has been long established as a reliable standard. It has been adopted and extended and an extensive set of target libraries has been developed for various flavors of DSPs based on boundary scan. Reliability is achieved through reuse of the same register set in different versions of emulation hardware across ISAs and within ISAs.
The use of unidirectional virtual paths for both target-to-host and host-to-target data transfers assists in ensuring that there is no data corruption. Further, host applications synchronize on data buffers connected to virtual paths, so there is no data loss.
Another feature of the RTA architecture is congestion control. With this capability buffers are guaranteed not to overflow.
During host-to-target data transfer, the RTA architecture signals the end of data transfer through a virtual path using callbacks, which notify target applications that data sent by the host has to be read. The virtual paths through which data is passed cannot be reused unless previously written data has been consumed.
There are several challenges in supporting a uniform multiprocessor RTA capability on various families of DSPs.
Each family has its particular variant of emulation hardware. This has an impact on the RTA protocol that is used to transfer data between host and target. For inst ance, some of the emulation capabilities on some DSPs use interrupts to signal the flow of data between host and target. In the absence of emulation interrupt support, the application must poll the emulation hardware for the presence of data.
Another problem is the support for DSPs with varying word sizes (16 bit and 32 bit). And, RTA must be supported in the presence of various memory hierarchies. Specifically, RTA must run when the application is loaded into on-chip or off-chip memory.
These concerns have been addressed on the target side by developing the RTA target software libraries that get linked with the application. These libraries comprise the software that is responsible for programming the emulation and peripheral device registers and effect data transfer.
On the host side, the RTA host software is a target independent layer that can filter data in a multiprocessor environment to send and receive the data from a particular target unambiguously.
This article is base d on excepts taken from ESC paper #543, Multiprocessor real-time analysis for scan-based emulation.
Related Articles
- VLSI Based On Two-Dimensional Reconfigurable Array Of Processor Elements And Theirs Implementation For Numerical Algorithms In Real-Time Systems
- A 0.79-mm2 29-mW Real-Time Face Detection IP Core
- Processor Design and Implementation for Real-Time Testing of Embedded Systems
- Designing A Real-Time HDTV 1080p Baseline H.264/AVC Encoder Core
- Real-Time Video System Design Based on the NIOS II Processor and µCLinux
New Articles
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
- Synthesis Methodology & Netlist Qualification
- Streamlining SoC Design with IDS-Integrate™
E-mail This Article | Printer-Friendly Page |