Customized DSP -> Flexible compression key to audio

Flexible compression key to audio
By Chris Hanna, Principal DSP Software Engineer, Lexra Inc., Waltham, Mass., EE Times
March 12, 2001 (1:40 p.m. EST)
URL: http://www.eetimes.com/story/OEG20010312S0090

Just about all the newest living room audio-video electronics and PC multimedia products being designed today will incorporate some form of compressed digitized-audio processing capability. Three popular audio compression algorithms will be used to achieve this compressed digital audio: MP3, AC-3 and AAC. The three are easily confused and are best distinguished from each other according to appropriate applications and the computing resources they demand.

Audio compression reduces the bit rate required to represent an analog audio signal while maintaining the perceived audio quality. Most audio decoders being designed today are called "lossy," meaning that they throw away information that cannot be heard by most listeners. The information to be discarded is based on psychoacoustics, which uses a model human auditory perception to determine which parts of the audible spectrum the largest portion of the human population can detect.

Fir st, an audio encoder divides the frequency domain of the signal being digitized into many bands and analyzes a block of audio to determine what's called a "masking threshold." The number of bits used to represent a tone depends on the masking threshold. The noise associated with using fewer bits is kept low enough so that it will not be heard. Tones that are completely masked may not have any bits allocated to them.

Discarding inaudible data reduces the storage, transmission and compute requirements of handling high-quality audio files. Consider the example of a typical audio signal found in a CD-quality audio device. The CD player produces two channels of audio. Each analog signal in each channel is sampled at a 44.1-kHz sample rate. Each sample is represented as a 16-bit digital data word. To produce both channels requires a data rate of 1.4 Mbits/second. However, with audio compression this data rate is reduced around an order of magnitude. Thus, a typical CD player is reading compressed data from a compact disk at a rate just over 100 kbits/s.

The widespread availability of this order of audio compression, in the form of the MP3 algorithm, was the catalyst for widespread distribution of audio files over the Internet and the emergence of companies such as Napster to proliferate this distribution. MP3 is defined as MPEG-1 layer III standard, not to be confused with MPEG-3. Developed in Germany in 1991 by the Fraunhofer Institute, MP3 uses perceptual audio coding, described earlier, to compress CD-quality sound by an order of magnitude, while providing almost the same fidelity.

In the MPEG standard, there is a hierarchy of audio standards. Layer one is lower complexity and lower quality. Layer two is middle of the road and layer three has a higher complexity and produces higher quality. MP3 supports mono or stereo operation at three different sample rates and a range of bit rates up to 320 kbits/s. A bit rate of 128 kbits/s represents "near CD-quality audio" with a 11:1 compression rati o. A 10-gigabyte hard drive, uncompressed with normal CD data rate, would hold only about 12 CDs. With the 11:1 compression ratio, about 132 CDs could be encoded using MP3.

In the typical MP3 decoder, incoming bits arrive at the coded bit rate and decoded audio samples are produced. In any of the various MP3 decoders there is a mix of data processing and signal processing. The data processing consists of parsing the bit stream, extracting fields, doing table lookups and the like. The Huffman decoding is an example of this where several tables are used to look up the actual values that are encoded.

All of this data processing is ideally handled by a RISC processor and requires no DSP. A RISC processor is better suited to being programmed using a high-level language like C than is a digital signal processor (DSP). DSP code tends to contain a small amount of code but the code performs large numbers of loop operations. In MP3 the real DSP computation occurs later in the process, and the computati on requires a large number of multiply-accumulates, which is best coded as optimized assembly routines. Thus, to process MP3 code most effectively, a mix of C and assembly code is required.

AC-3 was created by Dolby Labs and is now know as Dolby Digital. It is a multichannel standard for surround sound. AC-3 is the audio format used on DVDs, and is also the format specified by the Advanced Television Subcommittee for use in the HDTV standard in the United States. The availability of multiple program sources results in many applications in consumer products, ranging from DVD players and home theater equipment to HDTVs and set-top boxes.

AC-3 supports up to six coded channels at three different sample rates and a range of bit rates up to 640 kbits/s. The six channels are usually referred to as 5.1 channels where the .1 refers to the limited bandwidth low-frequency effects channel. AC-3 provides a specification for down mixing that allows the number of output channels to be different from the nu mber of input channels and handled consistently. In terms of compression ratio, six channels of 16-bit samples at a 48-kHz sample rate are a 4.6-Mbits/s data rate. For an AC-3 encoder bit rate of 384 kbits/s, this represents a 12:1 compression ratio.

An AC-3 decoder decompresses incoming bits arriving at the coded bit rate into audio samples for up to six coded channels. If the desired number of output channels is different than the number of coded channels, then the decoder down mixes the channels into the available number of outputs. As with MP3, there is a mix of data processing, which can be done with optimized C code, and DSP operations that require optimized assembly routines for efficiency.

AAC is the latest audio coding standard developed mainly by AT & T, Fraunhofer, Dolby and Sony. It is defined in the MPEG-2 standard. It supports the coding of multichannel audio, with up to 48 main channels and 16 low-frequency channels. AAC supports a wide range of both sample rates and bit ra tes. The support for 96-kHz sample rates allows for coding of "better than CD-quality" audio. There are three complexity profiles specified, with the low-complexity profile being the most widely used. AAC provides higher quality than MP3 at the same bit rate, or for the same audio quality it uses a 30 percent lower bit rate. It is intended to be the successor to MP3, but since MP3 has become the de facto standard it remains to be seen if AAC will supplant MP3.

AAC is specified as the audio coding method for the Japanese digital television standard. It is also specified for use in digital radio broadcasting in the United States and Europe, and a major music company has selected AAC for commercial music downloads via the Internet. Recent press releases indicate that it will be supported in the next generation of portable music players, and it is also the only audio coding method specified in the latest MPEG standard.

Since each of these algorithms is a mix of RISC and DSP computing, a RISC processor alone is inadequate for the DSP operations, while programming the DSP to perform the RISC processing functions extends the development cycle. A new breed of processor combines both types of computing architectures in the same execution engine. The result is the best of both with the disadvantages of neither.

Intellectual-property cores
Lexra Inc. has developed such a processor in its LX5x80 family of intellectual-property cores. The IP integrates a 32-bit RISC CPU with a high-performance DSP with a dual 32-bit MAC. The DSP executes Lexra's Radiax instructions, a set of 36 DSP instructions that extend the MIPS R3000-class instruction set architecture. The DSP provides all of the features found in other high-performance DSPs.

The R3000-class MIPS RISC CPU also supports MIPS-16 code compression, which can reduce code size significantly. A range of memory configurations is supported, and the internal instruction and data memory provides for deterministic execution of time-critic al DSP applications.

The processor has two instruction execution units-one for RISC and a second for DSP-but both are fed from the same program counter. To quantify the performance of this combination, the combined unit achieves 95 percent of the performance of a Texas Instruments C621 DSP while requiring only a fifth of the power and silicon area. The LX5180 is a uniscaler version with a single-instruction execution unit.

To implement a high-quality audio decoder, it's insufficient to use 16 bits throughout the implementation. Extra precision is necessary, for example, to pass the Dolby class A certification requirements. The architecture of the RISC/DSP core allows trade-offs to be made between 16-bit dual-MAC operations and 32-bit operations to maintain audio quality while reducing CPU time.

The memory hierarchy of a CPU running a demanding application like audio decoding has a significant impact on CPU performance. The architecture of the RISC/DSP core also allows memory trade-off s, so as to reduce memory at the expense of CPU time, or vice versa, providing the flexibility to meet the needs of a given system.

Some of these trade-offs include the fact that audio decoders requiring deterministic operation can use on-chip instruction memory (IMEM) and data memory (DMEM) for time-critical routines, thus trading off IMEM and DMEM size. For memory-limited applications, MIPS-16 code compression and/or compiler optimizations can be used to reduce code memory at the expense of increasing CPU time.

See related chart