|
|||||||||||||||
How to implement double-precision floating-point on FPGAs
By Danny Kreindler, Altera Corporation
October 03, 2007 -- pldesignline.com Floating-point arithmetic is used extensively in many applications across multiple market segments. These applications often require a large number of calculations and are prevalent in financial analytics, bioinformatics, molecular dynamics, radar, and seismic imaging, to name a few. As opposed to integer and single-precision 32-bit floating-point math, many applications demand higher precision, forcing the use of double-precision 64-bit operations. This article demonstrates the double-precision floating-point performance of FPGAs using two different approaches. First, a theoretical "paper and pencil" calculation is used to demonstrate peak performance. This type of calculation may be useful for raw comparison between devices, but is somewhat unrealistic as it assumes data is always available to feed the device, and does not take into account memory interfaces and latencies, place and route constraints, and other aspects of an actual FPGA design. Thus, secondly, the real results of a double-precision matrix multiply core that can easily be extended to a full DGEMM benchmark are demonstrated and the real-world constraints and challenges of achieving such results are discussed in detail. Introduction In a recent article titled FPGA Floating-Point Performance – A Paper and Pencil Evaluation, the author – Dave Strenski – discusses how to estimate the double-precision (64-bit) peak FP performance of an FPGA. In this article, his method is evaluated and – more importantly – he expands on it with "real-world" considerations for estimating the sustained FP performance in an FPGA. These considerations are validated using a matrix multiplication design running in an Altera Stratix II FPGA. The double-precision general matrix multiply (DGEMM) routine is referenced here. DGEMM is a common building block for many algorithms and is the most important component of the scientific LINPACK benchmark commonly used on CPUs. The Basic Linear Algebra Subprograms (BLAS) include DGEMM in the Level 3 group. The DGEMM routine calculates the new value of matrix C based on the product of matrix A and matrix B and the previous value of matrix C using the formula C = áAB + âC (where á and â are scalar coefficients). For this analysis, á = â = 1 is used, though any scalar value can be used as it can be applied during the data transfer in and out. As can be seen, this operation results in a 1:1 ratio of adders and multipliers. This analysis also takes into account the logic required for a microprocessor interface protocol core and adds the following considerations:
The FPGA benchmark focuses on the performance of an implementation of the AB matrix multiplication with data from a locally attached SRAM. The effort to extend this core to include the accumulator to add the old value of C is a relatively minor effort.
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |