|
||||||
Embedding FPGAs in DSP-driven Software Defined Radio applications
Embedding FPGAs in DSP-driven Software Defined Radio applications With the advent of software defined radio platforms in military aerospace and now more recently in some consumer radio and electronics segments, the usefulness of Field programmable logic (FPGAs) as reprogrammable digital signal processing (DSP) SDR engines is taking on increased importance. Field programmable logic has been the circuitry of choice for connecting high-speed peripherals like wideband A/D and D/A converters, digital receivers and communication links to programmable processors in embedded real-time systems. FPGAs (field programmable gate arrays) are especially well suited to handle the clocking, synchronization, and the other diverse timing circuitry needed to tame these specialized devices. In addition, FPGAs are excellent for data formatting tasks like serial-to-parallel conversion, data packing, time stamping, multiplexing, and packet formation. But their DSP capability has become one of the most significant capabilities inherent in FPGAs, as evidenced by sharp increases in engineering and marketing investments in this technology on the part of FPGA vendors over the last few years. Digital Signal Processing Tasks Signal intelligence receivers typically classify a signal by first performing a spectral analysis of the signal to estimate what type of modulation was used, and then apply demodulation algorithms to determine if useful information is extracted, such as intelligible speech or meaningful data. Other significant tasks for the DSP include decryption, data storage, channel switching, signal routing to other systems, logging activity, and sending audio or digital data to a operator for listening or display. In a cell phone base station, the number of digital signal processing tasks grows with each new communications standard. The proliferation of sophisticated digital voice and data protocols require decoding, convolution, framing, error correction, and vocoding. Compounding the processing load for these additional tasks is the steady increase in sampling rates requirements. To support new applications such as wideband CDMA for example, DSPs are being moved closer and closer to the antenna. To meet these needs, DSP clock rates have increased to over 200 MHz and many of the new devices feature two or more hardware multipliers. Nevertheless, as one of the most expensive and power-hungry resources in the system, it is clear that minimizing the substantial workload for the DSP can be quite important. The role of FPGAs in SDR FPGA synthesis tools now support "parameterizable" cores that accept bit width definitions and automatically generate core structures to match the signal processing accuracy requirements without wasting gates. A wide range of front end design tools are now available to suit the various input preferences of both hardware and software systems engineers. These include block diagram system generators, schematic processors, and high-level input language compilers for Verilog and VHDL. The speed, accuracy and ease-of-use of new simulators simplifies the testing of new designs and minimizes the time spent debugging applications. Third party vendors are now offering high-level IP cores to complement the standard cores supplied by the FPGA vendors. These range from complete DSP processors to application specific blocks like high-speed internet modems. With these new commercial "off-the-shelf" functions, FPGAs are now able to penetrate both the general-purpose ASIC market as well as the DSP market. Of even greater significance, the digital signal processing capabilities of FPGAs can often outperform general purpose DSPs. For example, if a wideband FIR digital filter requires 32 MACs (multiply/accumulate operations) within a single clock cycle, the general purpose DSP with only two multipliers will fall far short of the mark. On the other hand, FPGAs can easily incorporate 32 MAC cores to handle the task. Flexible and Reusable As new software radio algorithms are developed, they can first be tested on the DSP, taking advantage of wider range of code generation, simulation and optimization tools. When complete, the algorithm can be ported to the FPGA for better real-time operation or to support the processing burden of many parallel channels. Finally, for transition to high-volume production, most FPGA designs can be easily converted into mask tooling for custom ASICs. While reprogramming the FPGAs to handle new functions can be somewhat more complicated than writing new algorithms for a DSP, this level-of-effort gap appears to be closing. No longer the exclusive domain of the hardware designer, FPGA design tools are now being used more and more extensively by software engineers, ensuring that this major technology shift will represent the mainstream paradigm for future embedded system design. Software Radio Module Application
An on-board FPGA accepts real outputs from both A/D converters as well as complex base band outputs from both of the digital down converters. The FPGA implements the VIM (Velocity Interface Mezzanine) interface to deliver data directly into each DSP or PowerPC on the processor board, where FIFO buffers support DMA block data transfers at rates up to 400 MB/sec.
With an eye towards adding DSP capability, a natural choice for the FPGA in this kind of platform is the Xilinx Virtex-II family. With 96 dedicated 18x18 multiplier blocks and over 200 kBytes of block RAM, the XC2V3000 offers a generous mix of signal processing resources, even for some of the more substantial applications.
In the basic factory configuration of the module, the FPGA still performs the traditional tasks of timing, formatting, and glue logic for the various devices on board. Because these functions are relatively simple, they consume only 6% of the programmable logic. This leaves 94% of the logic blocks, all 96 multipliers and virtually the entire block RAM available for adding DSP algorithms.
To help demonstrate the power of these untapped resources, an engineering project was launched to implement a high-performance FFT engine. Since communications, radar, and signal intelligence systems all utilize FFTs for tracking, tuning and image processing operations, the FFT remains one of most popular algorithms for benchmarking processor performance.
In a nutshell, the FFT accepts a block of input time-domain samples and converts them into a block of output frequency-domain samples. Because the calculation is rather complex, it consumes a significant share of DSP processing resources and becomes a prime candidate for FPGA implementation.
Constructing the FFT
One of the benefits of using an FPGA over a conventional programmable processor for computing FFTs is the large number of multipliers available for simultaneous calculation.
In the 4,096 example above, a total of 60 multipliers are needed to implement all six FFT butterfly stages in parallel. Since the XC2V3000 has 96 multipliers available, it becomes obvious why FPGAs can often dramatically outperform a standard DSP processor having only two or four hardware multipliers, especially for algorithms like the FFT.
Since the FFT is inherently a block-oriented algorithm, the FFT operates most efficiently when a freely addressable RAM supports quick access to all input and output samples. However, this ideal model of random data availability is contrary to the sequential input data samples streaming from the A/D converter.
Fortunately, the configurable block RAM resources of the FPGA can be retooled to form a memory structure that feeds the appropriate samples into four input data memory ports of the butterfly engines in parallel, thus solving the data availability problem. This proprietary memory architecture allows subsequent input blocks to be processed in a continuous, systolic manner so that all of the multipliers in all six stages can be productively engaged all the time.
For every FPGA clock cycle, each radix-4 butterfly processes four input samples. Therefore, when the FPGA processing clock is equal to the A/D clock, the architecture above is capable of running four times faster than real-time. With suitable hardware multiplexing schemes, this same FFT engine can be used to handle four streams of input data instead of just one.
In this example, with two A/D converters and the FPGA all clocking at 100 MHz, the FPGA is only working at half capacity. But with a little extra effort, the engine can be set to handle 50% input overlap processing of both channels to fully utilize the hardware. In this case, the pipelined execution time is an amazing 10.24 microseconds for each FFT! This is four times faster than the time it takes to collect the 4,096 input points at a 100 MHz sampling rate, consistent with performing four FFTs in real time.
FFT Enhancements
Eight more multipliers are used to perform an optional power calculation at the FFT output, in which the real and imaginary components of each of the four outputs are squared and then added together. Finally, an averager stage adds the two outputs of the 50% input overlap FFTs to improve signal-to-noise characteristics.
At the output of the FPGA, a multiplexer allows the results of each signal processing stage to be directed to the processor interface. Figure 2 below shows all of the basic function blocks inside the FPGA of the daughter card module shown in Figure 1.
Conclusion
In order to achieve a calculation dynamic range of better than 90 dB, several techniques were employed to reduce the rounding and truncation errors inherent in FPGA integer arithmetic. After optimization for execution speed by deploying the available FPGA resources, the entire design utilized 76 of the 96 multipliers, 99% of the logic slices, and 97% of the block RAM of the XC2V3000 device.
Although this particular FPGA component is still expensive because of its recent introduction, two concentric subsets of the ball grid array footprint pattern accommodate two smaller devices in the same family, to save costs for less demanding applications.
Rodger Hosking is vice president, and Richard Kuenzler is Senior Design Engineer at Pentek, Inc.
Copyright 2005 © CMP Media LLC
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |