A 4-MHz parameterized Logarithm-Square Root IP-Core

Nelson Campos, Roberto Costa, Elton Costa, Gutemberg Junior and Elmar Melcher
Federal University of Campina Grande - UFCG
Center of Electrical Engineering and Informatics - CEEI

Abstract

Logarithms and square root are non-elementary operations frequently used in digital signal processing. In this work, implementation and design of an IP-Core to compute square root and multibase logarithm is presented. The design is parameterized in fixed point notation achieving a low arithmetic error even when irrational numbers are being calculated. The module was synthesized in ASIC using FSC0G_D_GENERIC_CORE from UMC and in FPGA occupying 518 logic elements and two DSP blocks for multiplication.

Keywords

Digital Signal Processing, FPGA, Logarithm, Square Root, VLSI.

I. INTRODUCTION

The logarithm operation is widely used in digital signal processing applications such as speech recognition [4] [10] and 3D graphics [5]. The square root operation is also present in many modern applications such as Cholesky decomposition, LU factorization and in the solution of quadratic equations [11]. This operation requires expensive computational resources and high energy consumption, and its efficient implementation may be accomplished with the use of dedicated hardware. This paper demonstrates the hardware implementation of an arithmetic module called SQRTLOG comprising four modules: sqrt, log₂ , log₁₀ and log_e. First, the paper describes some common algorithms to compute the square root and the binary logarithm. Next, the paper explains the proposed architecture of the SQRTLOG and finally discusses the results of the synthesized hardware in ASIC and FPGA.

II. SQUARE ROOT COMPUTATION METHODS

There are many ways to compute the square root operations for VLSI and FPGA implementations. Three algorithms will be discussed here.

A. The Newton-Raphson Algorithm

The Newton-Raphson method is widely used in order to calculate , by approximation. The reciprocal square root using this method may be computed iteratively according to equation 1, where qi is the approximate value of . After n iterations the square root of X is [6].

q_i+1 = q_i(3 - xq²_i ) / 2 (1)

B. The Restoring Algorithm

The restoring algorithm will calculate square root and its remainder value in iterative process. Consider that and D = S²+ R, where D is the radicand, S is the square root result, and R is the remainder. The restoring algorithm will guess the value of S and R iteratively. If there is a wrong guess of S, the previous value is restored[9].

C. The Non-Restoring Algorithm

The non-restoring algorithm is similar to the restoring one and it will calculate square root and its remainder value in iterative process. However, unlike the restoring algorithm, in case of a wrong guess of S, it does not change the bits of S more than once [9].

III. LOGARITHM COMPUTATION METHODS

Some techniques have been used in hardware implementations of Logarithms.

A. Look-Up Table

According to the implementation described in [10] any number can be represented in the form of a = N x 2^p, where p is integer and . Taking the binary logarihtm of a, the equation 2 is obtained. Since p is the most significant bit of a, and log₂(N) can be found using a LUT, the computation of the log_e(a) can be done mutliplying log₂(a) by a constant as shown in equation (2).

B. Cordic Method

The CORDIC is a simple and efficient algorithm to calculate hyperbolic and trigonometric functions setting one bit per iteration [2][7][8]. The logarithm computation may be accomplished using the hyperbolic expression shown in equation 3.

C. Floor-Shift Method

The method presented in [7] computes the binary logarithm through the following steps:

first calculate log₂(x) by searching for the leading one to get the integer part of log₂(x)
then calculte the fractional part of the logarithm with equation 4:

IV. PROPOSED ARCHITECTURE

This section discusses the proposed architecture of the SQRTLOG, which is divided in two parts: the sqrt module and the logarithm module. The sqrt module computes the square root of N = (a₁+a₂+a₃+...+a_n)² guessing one bit at a time according to equations 5 to 7, where a_m is the m-th bit being guessed.

The steps to compute the square root algorithm can be seen in the flowchart depicted in Figure 1.

Fig. 1. Square Root Algotihm Flowchart

Similarly, the algorithm to compute the binary logarithm proposed by [12] was adapted to a finite state machine for hardware implementation setting one bit per iteration according to equation 8. The algorithm referenced by [12] takes the logarithm of x (where ) and the finite state machine scales a number out of this range according to equation 9. To compute logarithms in base 10 and natural, SQRTLOG simply takes the binary logarithm output multiplying its result by 1=log2(10) and 1=loge(10), respectively.

The algorihtm flowchart to compute the binary logarithm can be seen in Figure 2.

Fig. 2. Binary Logarithm Algotihm Flowchart

The square root and logarithm modules are interconnected using an interface Valid/Ready present in AMBA AXI Protocol [1] and its architecture is described in Figure 3. The signals of its interface are described in Table I and depicted in Figure 4.

Fig. 3. The SQRTLOG architecture

TABLE I SQRTLOG: DESCRIPTION OF THE INTERFACE SIGNALS

Signal name	Port type	Size in bits	Signal description
clock	input	1	clock signal
reset	input	1	reset signal
op	input	2	operation code that selects one computation (sqrt, log2, log10 or loge)
data_in	input	N_BITS	input number generated by the source to stimulate the SQRTLOG
data_out	output	N_BITS	output number computed by the SQRTLOG
iReady	output	1	handshake signal that indicates that the destination is ready to receive the number
oReady	input	1	handshake signal that indicates that the SQRTLOG is ready to receive the number
iValid	input	1	handshake signal that indicates that the source is ready to send the number
oValid	output	1	handshake signal that indicates that the LOG is ready to send the number
done	output	1	indicates that the computation is done

Fig. 4. SQRTLOG: Interface signals

V. RESULTS

This section discusses the synthesis results of the SQRTLOG in ASIC and in FPGA.

A. ASIC Synthesis

The module was simulated using dc shell from Synopsys [3] with the standard cell library FSC0G_D_GENERIC_CORE from UMC [13] with 0:13µm CMOS technology using a 4MHz clock. During the experiment, the number of ports and cells are plotted as a function of the number of bits of N (see Figure 5).

Fig. 5. SQRTLOG: Number of ports and cells vs NBITS

The area and the power consumption as a function of the number of bits of the SQRTLOG are presented in Figure 6.

Fig. 6. SQRTLOG: Area and Power vs NBITS

The arithmetical percentage error of the module was analyzed to compute and and the results can be seen in Figure 7.

Fig. 7. Arithmetic error vs NBITS

The module was parameterized in fixed point representation with QI:F, with I integer bits and F fractional bits and NBITS = I +F. The experiment setup measured the module with NBITS, ranging from 4 to 24 bits, with a 4-bit step for each measurement. Table II shows that when NBITS = 4, it presents the lower area and power consumption. However, the Higher arithmetic error is obtained. For NBITS = 24, the arithmetic error decreases significantly, increasing power and area. For NBITS=16, there is a good trade-off between power and area vs % error, as can be seen on Table II.

TABLE II: ASIC SYNTHESIS RESULTS

NBITS	POWER	AREA	%error sqrt(pi)	%error ln(pi)
4	4.21mW	7010mm2	1.267	12.643
16	9.65mW	18513mm2	0.165	0.700
24	17.67mW	35683mm2	0.0134	0.017

B. FPGA Synthesis

The design was also synthesized (with NBITS = 20) in the altera DE1-SoC board using 2% of its logic utilization. The synthesis report was generated using the Quartus Prime from Altera and the results can be seen in Table III.

Using the Netlist Viewer from Quartus the circuit of the Figure 8 was obtained. Due to the square part of the logarithm algorithm, two multipliers were inferred from a total of 87 of the FPGA device.

Fig. 8. Quartus Netlist of the SQRTLOG

TABLE III: FPGA SYNTHESIS RESULTS

FPGA: Cyclone V
Device: 5CSEMA5F31C6

Logic utilization (in ALMs)	518 / 32,070 ( 2 % )
Total registers	432
Total DSP Blocks	2 / 87 ( 2 % )

VI. CONCLUSIONS

This paper introduces some methods to compute the square root and logarithms in hardware implementation. It is proposed the architecture of a module called SQRTLOG, which basically computes four operations: sqrt, log₂, log₁₀ and log_e. Results are available for both implementations, ASIC and FPGA. It is shown that the SQRTLOG presents a good trade-off between arithmetic error and the number of bits of its input, providing a fast and accurate algorithm for digital signal processing and scientific applications.

ACKNOWLEDGMENT

The authors would like to thank the PEM (Projects for Excellence on Microelectronics) initiative for the financial support.

REFERENCES

[1] AXI AMBA. Protocol specification. ARM, June, 2003.

[2] Liu Bangqiang, He Ling, and Yan Xiao. Base-n logarithm implementation on fpga for the data with random decimal point positions. In Signal Processing and its Applications (CSPA), 2013 IEEE 9th International Colloquium on, pages 17–20. IEEE, 2013.

[3] Design Compiler. Synopsys inc, 2016.

[4] Aidong Deng, Li Zhao, and Yan Zhao. Recognition of acoustic emission signal based on mae and propagation theory. In Management and Service Science, 2009. MASS’09. International Conference on, pages 1–4. IEEE, 2009.

[5] Hyejung Kim, B-G Nam, J-H Sohn, J-H Woo, and H-J Yoo. A 231-mhz, 2.18-mw 32-bit logarithmic arithmetic unit for fixed-point 3-d graphics system. IEEE journal of solid-state circuits, 41(11):2373–2381, 2006. [6] Yamin Li and Wanming Chu. Parallel-array implementations of a nonrestoring square root algorithm. In Computer Design: VLSI in Computers and Processors, 1997. ICCD’97. Proceedings., 1997 IEEE International Conference on, pages 690–695. IEEE, 1997.

[7] AM Mansour, AM El-Sawy, MS Aziz, and AT Sayed. A new hardware implementation of base 2 logarithm for fpga. International Journal of Signal Processing Systems, 3(2):171–181, 2015.

[8] Pramod K Meher, Javier Valls, Tso-Bing Juang, K Sridharan, and Koushik Maharatna. 50 years of cordic: Algorithms, architectures, and applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 56(9):1893–1907, 2009.

[9] Rachmad Vidya Wicaksana Putra. A novel fixed-point square root algorithm and its digital hardware design. In ICT for Smart Society (ICISS), 2013 International Conference on, pages 1–4. IEEE, 2013.

[10] VB Saambhavi, SSSP Rao, and P Rajalakshmi. Design of feature extraction circuit for speech recognition applications. In TENCON 2012- 2012 IEEE Region 10 Conference, pages 1–5. IEEE, 2012.

[11] Shashank Suresh, Spiridon F Beldianu, and Sotirios G Ziavras. Fpga and asic square root designs for high performance and power efficiency. In 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, pages 269–272. IEEE, 2013.

[12] C Turner. A fast binary logarithm algorithm. IEEE Signal Processing Mag, 27(5):124–140, 2010.

[13] UMC. United Microelectronics Corporation. www.umc.com.

If you wish to download a copy of this white paper, click here