New DSP core can be embedded in SoC devices for multimedia applications; it provides high speed, low power consumption, and a smaller core area
Tokyo -- June 22, 2006 −− Renesas Technology Corp. today announced the development of a high-speed, low-power-consumption synthesizable DSP (Digital Signal Processor) core*1 for SoC devices. The DSP core uses a new saturation processing method*2 with a saturation anticipator circuit, as well as a layout technique that implements a hierarchical structure optimized for operation speed. These technology advances enable the core to achieve speeds that are approximately 20% faster than the previous Renesas DSP design.
Test chips for the new Very Long Instruction Word (VLIW)*3-type synthesizable DSP core were fabricated using a 90nm CMOS process. The core achieved a maximum operating frequency of 1.047GHz at a supply voltage of 1.2 V. Power consumption for performing a 128-tap FIR filter operation at that speed was only 0.10mW/MHz, and the silicon area of the core was extremely compact: about 0.5mm2.
This DSP core will be embedded in various Renesas SoC devices for the next-generation multimedia processing applications of electronic products and systems.
Test chip micrograph
Background
In recent years the quality and resolution of multimedia data such as audio and video has increased, and the trend is continuing. This makes it necessary to process large volumes of multimedia data at extremely high speed. For example, using the AAC (Advanced Audio Coding) compression format for audio data and a processing speed 12 times the normal rate, it is possible to encode (convert to compressed form) one hour of audio data in five minutes. Thus, the audio content can be recorded in a small amount of time. Video contains much more data, however, and high-definition TV (HDTV) video is more data-intensive than standard TV (SDTV) video. For the H.264/AVC (Advanced Video Coding) video-compression format used in HDTV applications, for instance, the encoding workload imposed by the HDTV screen size (1,920 x 1,080 pixels) is six times that of SDTV screen size (720 x 480 pixels).
DSP can process multimedia data very efficiently and are now used in many different applications. As the processing load imposed by multimedia data continues to grow, so does the demand for faster DSPs. Specifically, processors performing bit-rate control for HDTV-class video have to operate at speeds in excess of 1GHz. Yet, at the same time, high-speed DSPs suitable for being embedded into SoC (system-on-a-chip) devices for digital home appliances and other electronic products must be very small and have low power consumption. That combination of characteristics is difficult to obtain.
Details of the Technology
Against this background, Renesas Technology has developed a new technique for increasing the speed of its DSPs and has applied it in a VLIW-type synthesizable DSP core. The new core operates at 1GHz and can be embedded in SoC devices. The technique it employs to achieve this faster operating speed has two main aspects, which are described below.
(1) | New Saturation processing with Saturation Anticipator Circuit |
| DSPs perform a large number of multiply-and-accumulate loop operations. They use guard bits to prevent overflows during arithmetic operations and provide efficient data processing. If an overflow occurs when the DSP is converting data with a guard bit to data with no guard bit, the data is converted to a specified maximum or minimum value. The saturation circuit performs the important function of detecting overflows. Renesas Technology has developed a new type of saturation circuit. In a conventional saturation circuit, a saturation operation is performed after the addition operation has been completed. If saturation is not detected, the saturation circuit instructs the final stage of the arithmetic circuit to output the result produced by the adder. If saturation is detected, though, the saturation circuit instructs the final stage to output the maximum or minimum value, as appropriate. Since these operations must be performed one after another in sequence, they constitute an obstacle to achieving high-speed processing. In contrast, the newly developed technique operates as follows : (a) | At the same time that the data is being input to the adder, the checker circuit uses leading-zero anticipation (LZA)*4 to anticipate whether or not saturation will occur. | (b) | Anticipation takes place in parallel with addition. Based on the anticipated result, the Anticipator circuit instructs the final stage of the arithmetic circuit to output either the result produced by the adder or the specified maximum or minimum value. | The fact that the adder and Saturation Anticipator circuit operate in parallel increases the processing speed. This technique provides a speed boost of 10.5% over conventional designs. |
(2) | Layout Technique with Hierarchical Structure Optimized for Operation Speed |
| Conventional layouts have a hierarchical structure organized around the function modules. This results in "critical paths" for which speed becomes problematic as the wiring length gets longer. When developing the new DSP, Renesas Technology analyzed the critical paths for which speed is most important, then created a hierarchical structure that is optimized for operation speed. The optimization was aimed at shortening the wiring lengths of the critical paths. Its main features are as follows: (a) | Critical paths are not routed via multiple modules. | (b) | The arithmetic unit and bypass circuits such as the control lines connected to it, are bundled into a single module. | Simulations show that the optimized structure achieves a speed increase of 9.3% over conventional designs. |
Details of the technique were revealed in a technical paper presented at the 2006 Symposium on VLSI Circuits held in Honolulu, Hawaii from June 15, 2006.
Notes: | 1. | Synthesizable core : A type of intellectual property (IP) using register transfer level (RTL) description that provides support for synthesized logic. Synthesizable cores make design assets highly reusable. |
2. | VLIW (Very Long Instruction Word) : A technology for boosting the processing performance of microprocessors by executing multiple independent instructions at the same time, as if they were a single instruction. The new DSP can execute two instructions at once. |
3. | Saturation processing : Processing in which data in a format with guard bit is converted to data in a format with no guard bit. Data values for which an overflow occurred are converted to a specified maximum value, while data values for which an underflow occurred are converted to a specified minimum value. |
4. | LZA (Leading-Zero Anticipation) : A leading-one (start bit of absolute expression) anticipation method used in floating-point processing units. |
* Names of products, companies, and brands mentioned in this document are the property of their respective owners.
Supplementary Figure : 40-bit ALU (Arithmetic Logical Unit)