Implementing floating-point DSP on FPGAs
For developers using FPGAs for the implementation of floating-point DSP functions, one key challenge is how to decompose the computation algorithm into sequences of parallel hardware processes while efficiently managing data flow through the parallel pipelines of these processes.
In this article, we'll discuss our experiences exploring architectures with Xilinx PicoBlaze controllers, and present a design strategy employing the ESL techniques of model-based and C-based design to demonstrate how you can rapidly integrate highly parameterizable DSP hardware primitives into power-efficient high-performance implementations in Spartan devices.
Hardware Acceleration and Reuse
High-performance implementations of floating-point DSP algorithms in FPGAs require single-cycle parallel memory accesses and effective use of pipelined arithmetic operators. Many common DSP vector and matrix operations can be split into batch calculations fulfilling these requirements. Our architectures comprise of Xilinx PicoBlaze worker processors, each with a dedicated DSP hardware accelerator (Figure 1). Each worker can do preparatory tasks for the next batch in parallel with its hardware accelerator. Once the DSP hardware accelerator finishes the computation, it issues an interrupt to the worker. The worker's job is to combine the accelerated parts of the computation into a complete DSP algorithm.
Figure 1. PicoBlaze-based architecture for floating-point DSP. The DSP hardware accelerators are modeled and implemented using Celoxica DK.
It is ideal if you limit implementations to the batch operations of each worker starting in a block RAM, performing a relatively simple sequence of pipelined operations at the maximum clock speed and returning the result(s) back to another block RAM. You can effectively map these primitives to hardware, including the complete autonomous data-flow control in hardware. You can also code the related dedicated generators of address counters and control signals in Handel-C, using several synchronized do-while loops. Simulink is effective for fast derivation of bit-exact models of the batch calculations in DSP hardware accelerators.
E-mail This Article | Printer-Friendly Page |
|
Related Articles
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)