How to maximize FPGA performance

By Michelle Fernandez, Xilinx
January 15, 2007 -- pldesignline.com

The more that can be done upfront with good coding styles, timing constraints definition, and resource planning, the easier it will be for the downstream tools to achieve timing requirements.

As FPGAs push the envelope of performance, understanding how to design for maximum performance requires knowledge of the device architecture and design software. Today's FPGAs resemble a true System-on-a-Chip (SoC) with many more sophisticated features than the glue logic FPGAs of the past. To maximize system performance, designers need to use proper design techniques such as defining timing constraints and selecting options in synthesis and implementation that work best for their design. This article describes how to achieve faster timing in the fewest design iterations.

Understanding the architecture

When evaluating a new FPGA architecture, it is important to understand the hardware features and the tradeoffs that can be made in the architecture. Datasheets, user guides, and technical papers on the architectural features should be thoroughly reviewed before moving forward with a design.

The first thing to learn about any FPGA is what makes up the basic fabric of logic. For example, each of the configurable logic blocks (CLBs) in a Xilinx Virtex-5 FPGA contains two slices; and each slice contains four 6-input look-up tables (LUTs), four registers, and dedicated carry logic. For maximum utilization of each slice, it is important to take into consideration the width of the LUTs, the connectivity between the basic elements, and any shared resources.

Many FPGA architectures also contain hard IP blocks, such as embedded memory and blocks used for DSP functions. If a hard-IP block continuously shows up as the source or destination of your critical path, there are a couple of things that can be analyzed to improve the performance. First, check to see if the design is making the most of the block's features and that the synthesis tool is inferring the features you expected from your RTL code. Use the dedicated pipeline registers inside the blocks to reduce the setup and clock-to-out timing. Evaluate the tradeoff between using dedicated blocks versus implementing the same function in slices to allow for placement flexibility. This can especially be important when using a high percentage of hard-IP blocks.

The clocking resources that are utilized in a design can also affect a design's performance. For example, Xilinx Virtex-5 FPGAs have I/O, regional, and global clocking resources. These devices are divided into clock regions which at most, can contain 4 regional clocks and 10 global clocks. During design planning, it is important to analyze how many clock regions are going to be used as well as specific clocks within a clock region. Placing your I/Os so that their interface logic does not require all the clock resources in a clock region gives the implementation tools greater placement flexibility.

Click here to read more ...

Industry Articles

How to maximize FPGA performance