|
|||
Integrating High-Quality Audio into Mobile DesignDave Sparks, Sonic Networks Multimedia functionality is becoming more important in handheld products, and consumers are demanding higher fidelity audio in their wireless devices, PCs, games, software applications, and music synthesizers. In order to meet this demand and enable high-quality music playback on a wide range of consumer devices, product designers must trade-off performance, memory, and power consumption to optimize for user preferences and expectations. As a result, there are some key considerations when integrating audio synthesis into an embedded platform, including the functional components of the systems, related standards, design options, memory issues, and performance trade-offs. Differentiating to Compete MIDI Audio synthesis is one such feature that has provided an opportunity to both differentiate products and increase revenue by way of polyphonic ringtone capabilities. Therefore, it is becoming more important for embedded designers to understand the major issues and design techniques when integrating a high-quality audio synthesis solution into a mobile platform. MIDI — The Format Standard The MIDI standard originated as a protocol to transmit a musical performance over a 31.25Kbits/s serial cable. With limited bandwidth available, the protocol comprises primarily control data such as "Note-On," signifying the moment when a performer presses a note on a musical keyboard, and "Note-Off," signifying when the note is released. The SMF format was created in the 1980's as a means to capture the MIDI data stream into a file for editing on a computer. Events are stored in a stream format with the addition of delta timestamps to mark the amount of time since the last event. The file format is very compact, with a typical files size between 10 and 100KB. A similar file stored in a perceptive audio coder format such as MP3 would be 4MB. While MIDI behaves similar to a codec at the receiver end, an encoder that can encode an arbitrary audio stream is impractical today. Synthesizing Audio FM synthesis uses a purely algorithmic technique of modulating a carrier signal with a modulator. The resulting output is a rich spectrum of sound created by the sums and differences of the two frequencies. By varying the amount of modulation applied to the carrier over time, the spectrum can be manipulated to imitate real instruments, or create new synthetic sounds. This is the synthesis technique popularized by Yamaha, and which is incorporated into their MA series of MIDI synthesis ICs. In contrast, a sampling synthesizer utilizes recordings of actual instruments as well as synthetic sounds. By varying the playback speed through interpolation, a single recording can be used to synthesis a range of frequencies. The sound is often further manipulated using filters to dynamically vary the output spectrum. Generally, sampling synthesizers produce more realistic sounding instruments but the realism comes at the cost of additional read-only memory (ROM) for the sample library. FM synthesizers require less memory to store the algorithm parameters for their sounds, but signal processing requirements tend to be much higher. Audio Synthesis Components The file parser reads MIDI data from a file or input stream and reconstructs the timeline from the delta timestamps stored in the file. Timestamps are generally specified relative to the tempo of the musical piece although they can also be specified relative to the Society of Motion Pictures and Television Engineers (SMPTE) time code. The file parser converts the relative timestamps in the file to absolute time so that events can be fed to the MIDI interpreter at the appropriate time. The MIDI interpreter acts on the performance data in the MIDI stream. For example, when a "Note-On" event is received, the MIDI interpreter must locate the algorithm parameters that characterize the musical instrument to be synthesized, allocate resources ("voice") to synthesize the note and start the process of synthesizing the note. The performance data may occasionally request more voices than are available, in which case the MIDI interpreter must determine which notes have priority. "Voice stealing" occurs when an active voice is reallocated to synthesize a new note. The synthesis engine receives control data from the MIDI interpreter and synthesizes the audio based on the supplied parameters ("program") and, in the case of a sampling synthesizer, the sample data. The output of all the voices is mixed together based on the MIDI controls to render the final audio output. Sweetening the Audio Output Audio filters are used to vary the spectrum of the synthesizer to simulate changes in brightness, such as the natural decay of a piano string. A chorus is a delay line with a variable tap used to simulate multiple voices, providing a richer tone to brass and string section sounds. A reverb is a combination of delay lines and all-pass filters used to simulate the reverberation of different environments such as a concert hall or stadium. All of these effects are normally controlled on an individual instrument level. For example, the brass section can have chorus effects applied without affecting the piano. An audio exciter brightens the audio by adding harmonics to fill in the upper frequency range, an effect that can help make up for harmonics that may be lacking in the original samples. A compressor/limiter maximizes the output signal level by increasing the output gain when the overall volume of the synthesizer drops, which is a useful effect for a ringtone that needs to be heard in a noisy environment. EQ can be used to compensate for characteristics of the transducer and acoustics of the mobile device itself. Performance Optimization The specifications of the processor architecture are very important. The number of registers, availability of zero wait-state memory such as cache or tightly-coupled memory (TCM), and signal processing capability (such as multiply-accumulate operation [MAC] pipelines and saturating arithmetic) can all significantly influence performance. The bulk of the code in a software-based synthesizer is the control logic in the file parser and MIDI interpreter. This code represents 5 to 20% of the overall execution time of the synthesizer, runs well on a 32-bit general purpose processor, and benefits from both instruction and data cache. The code for the synthesizer engine is usually much smaller than the control code, but represents 80 to 95% of the overall execution time of the synthesizer. The synthesizer engine should be one that is designed specifically for embedded applications and consists of small loops of a few hundred bytes executing tens to hundreds of cycles at a time. Due to its small size, it is not significantly impacted by nor does it contribute to cache pollution. If no cache is available, locating the synthesizer engine code into TCM will likely double the performance of the synthesizer engine. Due the nature of signal processing in the engine code, it will also benefit from a MAC pipeline and saturating arithmetic. If DSP bandwidth is available, it may make sense to offload this code to a DSP, which is usually more efficient at executing signal processing algorithms. If the control code is to run on a separate general-purpose processor, some consideration will have to be given to moving the processed control data down to the DSP to control the synthesis engine. Sampling synthesizers also access a large amount of sample data, which is typically stored in ROM, from inside the synthesizer engine inner loop. Access tends to occur in periodic sequential reads. Making the sample data cacheable can result in a significant performance increase, as a typical 32-byte cache line holds enough data to keep the inner loop running at zero wait-states for many iterations. Assuming that instruction and read-write data are already cached, enabling cache for sample data may nearly double the performance. While sample data is not very susceptible to cache pollution, it does contribute to it, as sample data is typically used once or twice in a loop and then it may be many more iterations before it is used again. Performance Trade-offs It is possible to get more voices for the same processor bandwidth by reducing the complexity of a voice. As usual, this comes at a cost to quality, but that point may be moot if the user is listening to the audio through an 8mm transducer. Here are some tradeoffs to examine when playing the numbers game: Sample rate is probably the single biggest contributor to audio quality. However, if the transducer specs are 300 Hz to 3 kHz +/-3dB, there is little point to running the synthesizer at 48 kHz. There is a direct relationship between the processor bandwidth used by the synthesizer engine and the sample rate. Of course, as the sample rate drops, other parts of the synthesizer become larger contributors to the overall performance. Some synthesizer architectures feature a low-pass filter that can be controlled by the sound designer. This can be used to increase the overall quality of the instrument sounds. The filter uses considerable processor bandwidth in the synthesizer engine and eliminating it may reduce execution time by as much as 35%. However, dropping the filter may require additional sample memory to properly synthesize certain types of sounds. Stereo output can also be costly. While most of the signal path in a mobile synthesizer is monophonic, the final output stage uses a stereo pan control to steer audio output to left and right channels. Eliminating the stereo pan control reduces execution time by eliminating the control logic and MACs in the inner loop, reduces the memory footprint by cutting the buffer size in half, and reduces cache pollution as well. Size is Important Since sample-based audio synthesis has at its core a wavetable of recorded sounds to drive the oscillators, the size and quality of the wavetable is crucial to the resulting quality of the synthesized sound. Therefore, the process of wavetable creation or selection is considered by many to be the most important aspect of a successful MIDI solution. After all, you can have the most elegant synthesizer design possible but if the samples you are playing back are of poor quality, the entire solution will sound bad. The samples must be free of any background or player noise, of adequate dynamic range, and consistent in loudness and timbre across the range of notes sampled. It is not enough to have balance across one instrument's scale, all instruments must balance with each other when played together in the context of a musical piece. This requires more than engineering finesse and is often a process undertaken by professionally trained musicians with highly discerning ears. Once the instruments have been sampled, the resulting recordings need to be key-mapped. This is a process in which individual samples are assigned a range of notes they are used for playback on. After the key mapping process is completed, the task of "voicing" or adding the synthesizer control structures is done. This involves musical decisions and programming to take the final set of recordings to a playable state. Time and velocity variant filters are added, amplitude envelopes to modulate the volume over time, pitch modulation, layering of sounds for synthesizer voices, etc. all are done at this stage. In order for the final wavetable to sound correctly with the standard MIDI files available, careful attention should be given to volume balancing and "mixing" the instrument set so it plays well in a multi-timbral, musical setting. Small footprint wavetables for mobile handsets and audio players take on an extra set of important tasks that involve several techniques to reduce the size of the wavetable, while maintaining a high quality output. These tasks may involve pitch and time compression techniques, specialized looping and sampling rate reduction, equalization, and many others. To ensure the best results, special consideration should be given with small footprint wavetables to optimize them for the playback synthesizer and final product application. Related Standards In contrast, General MIDI (GM) is a joint MIDI Manufacturers Association (MMA)/ Association of Musical Electronics Industry (AMEI) standard that defines a common set of 128 instruments and 47 percussion sounds and the means to select them on any platform that supports it. This gives the author of a music file some assurance that when his or her composition requires a violin, that the platform will attempt to reproduce a violin sound. General MIDI 2 increases the number of sounds available and further defines the behaviors of a compliant platform. However, even the combination of General MIDI and SMF files still cannot assure the quality of the sound that will be reproduced on a particular platform. To address this limitation, the Downloadable Sounds (DLS) standard was jointly created by the MMA and AMEI to allow content authors to create files of instrument sounds that can be downloaded to a compliant synthesizer. DLS gives the author a standardized method to control the sound of the instruments used to reproduce a musical performance. DLS-2 increases the capability of DLS-compatible synthesizer and provides for both forward- and backward-compatibility. DLS-2 (under the moniker SASBF) was adopted by the MPEG standards body in a joint effort with the MMA as part of MPEG-4 Structured Audio. Shortly after the DLS-1 standard was ratified, MMA/AMIE released the eXtensible Music Format (XMF) file format, which combines an SMF music file with a DLS file into a single encapsulated file. This format gives the author a way to deliver an audio performance in a single compact file that gives the listener a consistent playback experience on compatible platforms. Given the push to open the mobile platform to more content, we can expect to see standards-based formats make significant inroads in the near future. Indeed, third-generation project partnership (3GPP) has been working with the MIDI organizations to standardize a new musical file format for mobile devices. To address this issue, a joint task group from the MMA, AMEI, and 3GPP approved the Mobile-DLS standard (mDLS) in September 2004. This is an extension of the Downloadable Sounds (DLS) standard intended for mobile applications. Mobile-DLS is a subset of the DLS-2 standard that provides for different profiles based on the capabilities of the device. A Mobile-DLS file can be combined with a MIDI music file into a Mobile-XMF file, creating a single file that can be accurately reproduced on a compatible synthesizer. While Mobile-XMF does not fully specify the audio output of the synthesizer down to the bit level, it represents a big step towards giving users a consistent playback experience across different mobile platforms. Finally, JSR-135 is a Java MP specification that provides a way for Java applications running on a mobile device to access the music synthesizer. Through predefined transport controls, this interface can be used in games to play audio sound tracks, or in music applications that allow the user to compose or "remix" audio. Wrap Up About the Author
Copyright © 2003 CMP Media, LLC | Privacy Statement |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |