Consumer IC Advances -> Meeting MPEG-4 advanced audio coding requirements
Meeting MPEG-4 advanced audio coding requirements
By Paul Master, Vice President, Technology, Fu-Huei Lin, Senior Member of The technical Staff, QuickSilver Technology Inc., San Jose, Calif., EE Times
December 18, 2001 (1:10 p.m. EST)
URL: http://www.eetimes.com/story/OEG20011218S0048
MPEG-4 audio does not require a single or small set of highly efficient compression schemes but rather a complex toolbox to perform a wide range of operations from low-bit-rate speech coding to high-quality audio coding and music synthesis. The MPEG-4 audio coding algorithm family spans the range from low bit-rate speech coding (down to 2 kbits/second) to high-quality audio coding (at 64 kbits/s per channel and higher). Advanced audio coding (AAC), the successor to MP3 (ISO/MPEG Audio Layer-3), performs generic audio coding at medium to high bit rates. AAC has the sampling frequencies between 8 kHz and 96 kHz and any number of channels between 1 and 48. In contrast to MP3's hybrid filterbank, AAC uses the modified discrete cosine transform (MDCT) together with the increased window lengths of 2,048 points. AAC can be switched dynamically between block lengths of 2,048 points and 256 points. If a single change or transient occurs, the short window of 256 points is chosen for better resolution. Otherwise, the longer 2,048-point window is used to improve the coding efficiency. The psychoacoustics model is used only in the encoding stage and plays the major role in audio coding compared to the vocal tract model in speech coding. In audio coding, the signal receiver is known as the human auditory perception system. Thus, better coding efficiency and improved coding perceptual quality is achieved by taking advantage of the human auditory system. The sensitivity of the human auditory systems for the audio signal varies in the frequency domain, e.g., the sensitivity of the human auditory system is high for frequencies between 2.5 and 5 kHz and decreases beyond and below this frequency band. The sensitivity is represented by the "threshold in quiet." The threshold in quiet is increased by the masking effect of a tonal or noisy audio signal. Any tone below this threshold will not be perceived. The most important psychoacoustics fact is the masking effect of spectral sound elements in an audio signal like tones and noise. For every tone in the audio signal, a masking threshold can be calculated. If another tone lies below this masking threshold, the louder tone masks it and it remains inaudible. The encoding bit-rate allocation is done by assigning the bit resource to audible elements, while the inaudible elements can be eliminated during the encoding process. The threshold in quiet is used as the minimum threshold of audibility. MPEG-4 AAC comprises MPEG-2 AAC plus a perceptual noise substitution tool, a long-term predictor tool and tools for bit-rate scalability. It is also important to note that an MPEG-4 AAC main decoder can decode MPEG-2 AAC files. Among the more salient points for designers to understand is that AAC is compute intensive. The three most processing-intensive AAC decoding algorithms are prediction, intensity/coupling and the filterbank, which includes inverse MDCT. These three algorithms consume 70 percent of the processing power require d to run the total AAC decoding algorithm. The most computationally complex is "prediction," which requires about 31 percent of the total computational power. Prediction is used for an improved redundancy reduction and is particularly effective in the case of more or less stationary parts of a signal, which belong to the most demanding parts of the required bit rate. Prediction can be applied to every channel using an intra channel or mono predictor, which exploits the autocorrelation between the spectral components of consecutive frames. For each channel, prediction is applied to the spectral components resulting from the spectral decomposition of the filterbank. For each spectral component up to a limit, there is one corresponding predictor resulting in a bank of predictors, where each predictor exploits the auto-correlation between the spectral component values of consecutive frames. The intensity/coupling tool falls into two categories: intensity stereo (IS) and coupling. The first is used to i mplement joint IS coding between two channels of a channel pair. Thus, both channel outputs are derived from a single set of spectral coefficients after the inverse quantization process. This is done selectively on a scale factor band basis when the IS flag is active. The coupling channel tool provides two functionalities. First, coupling channels may be used to implement generalized intensity stereo coding where channel spectra can be shared across channel boundaries. Second, coupling channels may be used to dynamically perform a down mix of one sound object into the stereo image. Subsequently, the filterbank tool performs final processing and requires about 18 percent of the total processing power of AAC. The filterbank is based on the IMDCT and Sine Shaped/Kaiser-Bessel Derived windowed and overlapped sequence where the first half of every sequence is added with the second half of the previous sequence. Conventional digital signal processing is generally used in MPEG audio decoder designs. For sy stems powered by electrical outlets, this is the acceptable design approach. However, DSP is power-inefficient for wireless applications, and now, with compute intensive MPEG-4 AAC audio being factored into next-generation wireless and mobile communications designs, it is prudent for designers to carefully consider IC options for higher performance and low power consumption. For example, typically an ASIC is factored in to accelerate parts of a compute-intensive algorithm so that the DSP can run at a slower clock rate, thus reducing power consumption. However, this design approach proves costly because the ASIC uses more silicon area to perform the necessary acceleration. One solution is QuickSilver's Adaptive Computing Machine (ACM) technology which can rapidly adapt the physical structure of the accelerator through a rapid spatial and temporal segmentation (SATS) process. It is 10 times more efficient at implementing the same AAC algorithms because the ACM's SATS process targets the exact hardware req uired, algorithm by algorithm, for that moment in time. Normally, running an algorithm on a conventional DSP or microprocessor incurs a considerable number of clock cycles. It takes a conventional microprocesssor about 1,013 clock cycles to perform the necessary addition of 27 floating point numbers for this 27-input adder. First, the microprocessor issues an address to memory and then it fetches an instruction. At this point, 40 or so different optimization techniques are applied to improve the performance of the instruction execution base. Depending on the brand, the microprocessor goes through four to 25 pipeline stages to pipeline the instruction process. All these steps occur to issue an address to memory, fetch the first data element and input it into a register. These steps are thus performed 27 times and require 1,013 clock cycles. A DSP implementation with dual multiply-accumulate units (MACs) performs the same benchmark in 107 clock cycles. A multiply-accumulate operation is performed in a sin gle unit cycle, however, two multiply accumulates can be performed simultaneously. The DSP is a modified Harvard architecture with two independent data streams and an independent instruction stream. Compared to the conventional microprocessor, the engineer can take advantage of the course-grain parallelism. This results in 107 clock cycles, which is an order of magnitude faster than a microprocessor. With adaptive computing, the system engineer has the ability to implement into the silicon the exact hardware required at any given moment in time. Thus, an algorithm that implements the 27-input adder can be downloaded into the ACM circuitry, and this computation is completed in seven clock cycles or an order of magnitude improvement over a DSP.