MPEG Standards -> Streaming audio needs interoperability

Streaming audio needs interoperability

Streaming audio needs interoperability
By Mark R. Banham, Director of Engineering, PacketVideo Corp., San Diego, EE Times
January 4, 2002 (7:29 p.m. EST)
URL: http://www.eetimes.com/story/OEG20011112S0045

Commercial radio networks with enough bandwidth to support streaming audio and video services are now a reality. General Packet Radio Service systems, cdma2000 (1X) systems and wideband code-division multiple access systems are either already in use, or quickly coming online in countries around the world.

The existence of these networks, however, is not enough to motivate wireless audio and video services and applications. For that, devices and platform delivery technologies are needed. To help the market develop these devices and platforms, the international community has voluntarily created a number of important standards.

The relevant standards include a large group of audio and video codecs, as well as systems and network protocols. Examples of such core standards are MPEG-4 Video, Adaptive Multi-Rate (AMR) Audio, Real-Time Streaming Protocol and Real-Time Protocol for transport of media data. Given the wide set of relevant standard s, how are devices and platforms actually being defined?

The answer lies with the carriers. Wireless carriers require systems that guarantee interoperability across multiple handsets, thereby guaranteeing support from multiple suppliers. For this reason they focus on utilizing widely recognized wireless industry bodies, such as 3GPP, 3GPP2 and the Wireless Multimedia Forum, to bring together core standards and provide the device and system recommendations. For example, 3GPP high-level recommendations include the Packet-Switched Streaming Services specification and the 3G-324M recommendation for two-way video communication. A critical part of most such high-level recommendations is MPEG-4.

The MPEG-4 standard is a wide-ranging and rich recommendation for the coding of audio-visual objects. The six parts of the standard cover all aspects of audio and video compression, systems and conformance. The video part of MPEG-4, known as MPEG-4 Visual, ISO/IEC 14496-2, is quite broad in terms of bit rat es and functionalities supported.

The concept of profiles segments the standard into tool sets that are appropriate for different applications. For example, wireless video communication is most often associated with the Simple and Simple Scalable Profiles of MPEG-4 Visual.

The Simple Visual Profile provides efficient, error-resilient coding of rectangular video objects, especially applicable to wireless video communication and transmission. The syntax of the Simple Profile also permits interoperability with the H.263 Baseline standard. The Simple Scalable Visual Profile adds support for coding of temporal and spatial scalable objects to the Simple Visual Profile. This is especially appropriate for delivering video over multiple types of communication channels that may have variable throughput, such as wireless data channels.

Crucial resilience
Error resilience within MPEG-4 is crucial for successful deployment of wireless multimedia services. Because of the hostile nature of wireless networks, errors inevitably are introduced into a transmitted bit stream. Errors may come from propagation effects, such as Rayleigh fading and shadowing, coverage problems due to receiver sensitivity, and multicell interference and handoff characteristics.

These errors can be present in forms of physical-layer bit errors, link-layer frame errors and network-layer errors. Additionally, packet losses may result from network buffer overflows. The treatment of these errors differs depending on the nature of the transmitted data. For example, it's possible to deliver standard nonreal-time data traffic, such as e-mail, error free to a device by trading off delay for error control. In this case, network protocols such as TCP and Radio Link Protocol retransmission schemes are used to guarantee the accuracy of the data. Real-time applications, such as videoconferencing and live-video streaming, cannot accept delay. In this case, network protocols such as UDP can be used for best-effort transmiss ion. With this approach, errors must be passed up to the application.

MPEG-4 video provides a number of tools to help the application localize and conceal errors. Errors are localized in the MPEG-4 video bit stream through the use of resynchronization markers. These markers can be inserted at any macroblock boundary in a video object plane, which is much the same as a picture frame, at the discretion of the person encoding the content. When resynchronization markers gather the data into "video packets," it becomes easier to deal with the containment and concealment of those errors.

Another tool within the MPEG-4 standard that is also useful for localization is "data partitioning." With this syntax, the motion information is separated from the texture information, which consists of the DCT coefficients of the coded picture. When errors are detected in the texture component, it is possible to use only the bits associated with the motion vectors to create the decoded image.

When error s are detected in the motion data, estimation techniques can be employed to recover lost motion vector information. This recovered information is then used for concealment.

This "motion-compensated concealment" approach results in decoded video that hides serious artifacts from the viewer.

Varying bit rates
In addition to the error-prone characteristics, all packet data channels of 2.5G and third-generation wireless networks exhibit highly varying user bit rates. Packet delays are often present due to limited lower-layer retransmissions, and instantaneous user data throughput varies in time as a result.

MPEG-4 temporal scalability involves adding enhancement information to a base layer of video in the form of additional encoded frames.

These enhancement frames serve to increase the frame rate, or temporal quality, of the scene when they are transmitted. MPEG-4 spatial scalability, another part of the Simple Scalable Visual Profile, involves encoding enhancement information in the form of differential images that increase the spatial quality of given base-layer frames.

In the case of audio codecs, the focus of the wireless multimedia standardization community has been on existing speech codecs for cellular communications. Two such codecs are AMR and Enhanced Variable Rate Coder. Audio codecs are needed to ensure good quality and the recommended wireless audio codec to support higher bandwidth and scalable audio communication is MPEG-4 AAC.

From the perspective of applications developers, these new MPEG-4-based systems will provide an opportunity to create experiences for mobile users that use both live and on-demand multimedia content.

MPEG-4 is the tool to provide the best overall quality for those applications, and thus help guarantee their successful deployment.