Fitting DSP to app no easy task
Fitting DSP to app no easy task
By David Katz and Rick Gentile, Senior DSP Applications Engineer, DSP and System Products Group, Analog Devices Inc., Norwood, Mass., EE Times
December 2, 2002 (2:37 p.m. EST)
URL: http://www.eetimes.com/story/OEG20021202S0061
Selecting a digital signal processor for wired and wireless networked multimedia applications is a complex endeavor. First, a thorough analysis of the processor's core architecture and peripheral set must be prepared, in the context of both present and near-term industry interface needs. Next, it is crucial to understand how multimedia data-video, images, audio and packet data, for example-flows through a DSP-based system in order to prevent bandwidth bottlenecks.
Among the first measures that system designers should analyze when selecting a DSP are the number of instructions performed each second, the number of operations accomplished in each processor clock cycle and the efficiency of the computation units. The merits of each of these metrics can be determined by running a representative set of benchmarks-for instance, video and audio compression algorithms-on the DSPs under evaluation.
The results will indicate whether the real-time processing requirements exceed the DSP's capabilities and, equally important, whether there will be sufficient capacity available to handle new or evolving system requirements. Many standard benchmarks assume that the data to be processed already resides within internal memory. This technique allows a more direct comparison among DSPs from different suppliers, as long as the designer reconciles the I/O considerations separately.
The right peripheral mix saves time and money by eliminating external circuitry to support the needed interface. Networked multimedia devices (NMDs) draw from a universe of standard peripherals. Prominent among these is, of course, connectivity to the network interface. In wired applications, Ethernet (IEEE 802.3) is the most popular choice for networking over a LAN, whereas IEEE 802.11b/a is emerging as the prime choice for wireless LANs. Many Ethernet solutions are available as a direct extension to the DSP. In addition, on DSPs that can support microcontroller functionalit y equally well, a TCP/IP stack can be managed right on the DSP.
Also necessary for linking the DSP to the multimedia system environment are synchronous and asynchronous (UART) serial ports. In NMD systems, audio codec data often streams over synchronous 8- to 32-bit serial ports, whereas audio and video codec control channels are managed via a slower serial interface such as SPI.
DSPs suitable for the NMD market will include an external memory interface that provides both asynchronous and SDRAM memory controllers. The asynchronous memory interface facilitates connection to flash, E2PROM and peripheral bridge chips, whereas SDRAM provides the necessary storage for computationally intensive calculations on large data frames.
A new peripheral that has started to appear on high-performance DSPs is the parallel peripheral interface (PPI). This port can gluelessly decode ITU-R-656 data as well as act as a general-purpose 8- to 16-bit I/O port for high-speed A/D and D/A converters or ITU-R- 601 video streams. It can also support a direct connection to an LCD panel.
Additional features are available that can also reduce system costs and improve data flow within the system. For example, the PPI can connect to a video decoder and automatically ignore everything except active video, effectively reducing a National Television Standards Committee (NTSC) input video stream rate from 27 Mbytes/second to 20 Mbytes/s and sharply reducing the amount of off-chip memory needed to handle the video.
Before reaching a final decision on the choice of DSP for a networked multimedia design, wired or wireless, it is imperative to understand the system-level data flow and how that flow can be implemented on the DSP.
Specifically, can data be brought in and out of the processor without falling behind on data and signal processing? Can the processor be kept fed with data, and can the data be accessed as needed during any given processing interval? These questions are crucial in a multimedia , network-centric system, where running algorithms efficiently is not enough by itself; the DSP must also handle the complete bidirectional system data flow.
Consider the case of a security system where an NTSC camera streams video and audio into a DSP at about 20 Mbytes/s, where it is compressed and sent out over a 100-Mbit/s Ethernet connection to be stored and archived to a remote disk drive. Moreover, the uncompressed video is routed from the DSP to a local display-an LCD or monitor, for example. Because the video memory requirement far exceeds available on-chip memory, data must be staged and manipulated via some larger-capacity, off-chip memory such as SDRAM.
Since many video compression algorithms operate on one block of data at a time, each block-a 16 x 16 pixel "macroblock," for example-can be transferred as needed from external memory. Some algorithms require multiple image or video frames to complete the desired processing, resulting in multiple bidirectional data transfers betwe en internal and external memory. Often, an input buffer streams into SDRAM concurrent with the DSP core compressing the data in the previous buffer. It is likely these buffers will be on different pages within SDRAM. This can result in costly latencies unless the DSP allows more than one SDRAM page to be open at a time.
Security scenario
The security system scenario is a realistic depiction of the daunting data transfer rates that must occur between several subsystems to support networked multimedia applications; there are at least five sets of simultaneous data movements involved in the above example.
When considering the overall data flow, it is not sufficient to simply verify that the total byte traffic moving through the system does not exceed the DSP's theoretical internal bandwidth (obtained by multiplying the bus speed by the bus width).
For example, in parts with high core clock rates, the buses between the core processor and the peripherals will typically be run at a rate of 133 MHz. With bus sizes of 32 bits, the throughput should ideally approach 532 Mbytes/s. In reality, this peak number can be achieved only if exactly one transfer is active and no other transfers are pending. As individual peripherals are added to the application, they must each arbitrate for the internal DSP bandwidth. System designers typically allow for arbitration delays by assuming that only 50 percent of the internal bandwidth is available.
It is clear that DSPs suitable for networked multimedia applications must have a direct memory access engine that is independent of the core processor. That is, the total number of DMA channels available must support the wide range of peripherals. Additionally, a flexible DMA controller can save extra data passes in computationally intensive algorithms such as MPEG or JPEG processing.
DSPs with two-dimensional DMA capability can facilitate transfers of macroblocks to and from external memory, allowing data manipulation as part of t he actual transfer. This is a very handy feature for interleaving/de-interleaving color space components . To achieve the maximum benefit from a DMA, a prioritized interrupt controller is needed to ensure that the core is interrupted only when data is ready to be processed or when processed data has been successfully transferred out.
Related Articles
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
E-mail This Article | Printer-Friendly Page |