FPGA use in DSP products is skyrocketing, and one only needs to look as far as the products into which DSP is deployed to understand why. Overall, DSP is becoming a highly ubiquitous technology, seeing its application not only in a myriad of consumer, automotive and telephony products, but also in an increasingly advanced class of devices. Applications such as wireless base stations, radar processing, fingerprint recognition and software-defined radio demand very high performance processing. This new genre of high performance DSP applications, however, pushes the performance envelope of standalone processors, and hardware solutions are evolving to boost performance. In the early 1990s, designers confronted the challenge of mustering more horsepower by deploying multiple processors to meet their performance needs. However, system level design becomes very difficult when coordinating the functionality of multiple processors, not to mention that this approach is costly and wasteful of resources. When the first DSP-empowered FPGAs appeared on the scene, DSP designers began using these devices to bolster the capabilities of their processors. With this approach, an FPGA complements the processor by accelerating performance-critical portion(s) of the DSP algorithm. With today's specialized FPGAs, such as Xilinx' Virtex 4 or Altera's Stratix II, resides tremendous potential to improve performance through parallelization. Indeed, DSP-specific FPGA technology has been shown to offer up to a 100-fold performance advantage over other implementation options (figure 1). Thus, it is increasingly common to find a standard DSP accompanied by an FPGA that executes high performance functionality, and the use of FPGAs in this fashion is forecasted to ascend rapidly. Figure 1 — FPGAs provide fast MACOPS (multiply/accumulate operations per second) as a factor of clock frequency times multipliers. Meeting the design challenge Along with this powerful hardware capability, however, comes the challenge to efficiently implement these FPGA-based DSP systems. Such large and complex designs tax conventional DSP design methodologies. This is largely due to the fact that the conventional FPGA design flow in the DSP space does not take advantage of two critical elements of an efficient and effective design flow: synthesis technology and portable intellectual property (IP). Those who have designed ASICs using synthesis technology are aware of its merits. For FPGA-based DSPs, this technology is essential, enabling design entry at a high level of abstraction and the automated exploration of area and performance trade-offs. The combination of rapid design entry, operating at a high level of abstraction and automation, provides not only a single instantiation of a design, but also a range of possible outcomes from which to choose. For an application in which performance takes priority over area, an implementation that consumes hundreds of multipliers might be required — it would be very fast, but would also consume a lot of area. Likewise, for a more area-sensitive application, an implementation that shares fewer multipliers at lower performance would yield a more diminutive result. These types of trade-offs require powerful tools, and are vital to the optimal development of advanced FPGA-based DSPs. The other key element in efficient DSP development is having the right building blocks, or IP. IP suited for these applications has two primary attributes: extensibility and portability. In contrast to its less adaptable counterparts, extensible IP enables the designer to build up custom IP functions without loss of efficiency. The new functional blocks are highly efficient because in the subsequent synthesis process unused or unnecessary portions will be optimized away. Portability also assures efficiency. DSP designers need to be able to design their algorithms once, and then have the ability to run them in any FPGA vendor's product without modification. Such portability affords great efficiency, as well as the freedom to choose an optimum implementation with ease. DSP verification can also present a challenge. When verifying a DSP, signal debugging and analysis becomes more involved than examining time and frequency-domain plots and scatter diagrams. Since digital signals are characterized by their sample time and discrete amplitude, DSP verification tools must efficiently define and manipulate time in multi-rate DSP applications. In addition, they must readily make the transition from full accuracy floating-point simulation to finite word length fixed-point simulation. Also needed is a language for modeling DSP algorithms that includes native support for concepts such as time, fixed-point resources and parallelism. Bringing a methodology together Recent advances in design technology have presented intriguing solutions to address DSP designers' unique challenges. Simulink from The Mathworks, a mathematical model-based system design environment, provides a powerful modeling and simulation capability for DSP designers. The environment natively handles DSP issues such as multi-rate discrete time definition and management and single-source, floating-point simulation. For FPGA implementation, DSP synthesis is the key innovation that links DSP verification with an optimal DSP implementation. With capabilities such as those embodied in Synplicity's Synplify DSP tool, designers have an automated, device-independent means to examine implementation trade-offs and achieve target mapping. Using DSP synthesis in conjunction with Simulink brings the expertise of both system architect and hardware designer together in a common environment. The system architect creates a vendor-independent model for Simulink, keeping the entry point at the purely algorithmic level and thereby maintaining his focus on the high-level functionality of the design. When it is handed off to the hardware designer, the specification has no architectural implications. As long as the modeling environment's DSP verification infrastructure allows for seamless integration of a synthesis engine, the hardware designer can examine architectural trade-offs without modifying the verification source. Since the source is preserved, the system architect need not worry about hardware implementation issues, and the hardware designer doesn't have to toil with the DSP algorithm specification. At the same time, the integrity and optimization of the design is assured, and the productivity of both team members is improved. Vital to this methodology is the use of a generic DSP library. Vendor-specific IP bogs down algorithm design with unnecessary implementation details. Using a library of generic DSP functions free of architectural parameters, input signals are processed, and output is produced based upon a high-level specification. With a high-level library, even latency associated with DSP functions can be deferred to the architecture optimization phase. It is through DSP synthesis that hardware functionality and implementation is achieved. Innovations such as DSP synthesis, Simulink and a portable library are key elements in improving DSP design, but what is also critical is bringing these capabilities into an overall methodology that links the RTL and implementation design domains. An optimal DSP design flow augments existing capabilities with a generic library and the combined capabilities of DSP synthesis and Simulink (figure 2). Figure 2 — DSP FPGA design flow During design specification, the system architect operates purely at the level of algorithmic abstraction. By using a functional blockset, the designer can capture the algorithm using familiar DSP concepts. Later in the flow, algorithm verification is greatly eased by Simulink's DSP verification environment features. Such capabilities as visualization and debugging, as well as built-in accelerators, facilitate rapid simulation of discrete-time designs. The engine of this design methodology, the determinant of system level goals for area and performance, is DSP synthesis. This step crafts an architecture that consumes the minimum resources required to achieve the needed performance. Applying appropriate system-level optimization techniques such as folding, system-wide retiming, and latency addition, DSP synthesis meets system level performance goals. The resulting architecture is produced in vendor-independent, synthesizable RTL code. Because the design remains vendor independent at this point, the full power of RTL synthesis tools can be applied for further design optimization. Impressive results When compared to a conventional flow, the described DSP design methodology reveals significant advantages. As designs get larger, it is likely that the DSP synthesis flow outpaces its traditional counterpart simply due to its latency-free algorithm, and because no time is required to synchronize multiple paths. Comparing the design results for DSP synthesis and conventional flows reveals consistent improvement for the former, even under differing optimization scenarios. When high-level optimization is not performed during DSP synthesis, any optimization that results is largely attributable to RTL synthesis alone. Even without DSP synthesis optimization, the number of logic units deployed decreases consistently across all test circuits, and performance improves as well. Several different optimization scenarios should be considered. When resource sharing is allowed, it is normal to expect significant improvements in resource utilization, at some performance penalty. The test circuits have proved this out, revealing significant reductions in resources consumed, at the cost of a rather significant decline in performance. This optimization technique is best applied when resources are limited and performance degradation can be tolerated. Retiming optimization techniques offer another option for enhancing DSP synthesis results. When retiming is an option, a significant performance boost compared to both DSP synthesis alone and conventional design can be observed, though it can come at the expense of consuming more resources. Some DSP synthesis solutions redistribute registers and introduce pipelines at the architectural level in order to achieve timing. Complementing this high-level retiming with gate level retiming enables further optimization by shifting registers around with a specific FPGA device in mind. This combination of high-level and gate-level retiming results in the most highly optimized result, with significant performance improvement at no cost in additional resources. Performance-hungry DSP applications today are driving the use of high-powered DSP-specific FPGAs that, in turn, present new and significant design challenges. In response to this challenge, an automated FPGA-based DSP development flow that builds upon existing methodologies has been proven to go well beyond traditional design methods in the delivery of highly optimized design results. The combination of DSP-specific modeling and simulation innovations, automated synthesis and optimization tools, and generic, portable DSP libraries are the critical elements in the flow. With these capabilities, DSP developers have a solution to the problematic productivity and design quality issues that have previously plagued them, and can now take full advantage of all that the new and powerful FPGA technology has to offer. Dirk Seynhaeve is director of DSP corporate applications engineering at Synplicity Inc. He has 20 years of experience in the ASIC design and EDA industries. He joined Synplicity to help define and roll out DSP solutions to add to the synthesis portfolio. Before, at Tera Systems, as Director of Technical Marketing, Seynhaeve focused on defining the product line strategies for RTL hand-off strategies. Prior to Tera Systems, Seynhaeve was Director of Technical Services at Tharas Systems, following a position as Director of Applications at Escalade. Andrew Dauman is vice president of corporate applications engineering at Synplicity, where he is responsible for all technical support, product verification, product training and technical publications. Since joining Synplicity in 1994, Dauman has initiated and grown Synplicity's applications engineering team from a concept into a worldwide organization. Prior to joining Synplicity, Dauman was a member of the AutoLogic ASIC synthesis team at Mentor Graphics Corporation. |