Gearing Up For Wireless Video with Compression
Gearing Up For Wireless Video with Compression
By Sebastien De Gregorio, Madhukar Budagavi, and Jamil Chaoui,
January 3, 2002 (4:20 p.m. EST)
URL: http://www.eetimes.com/story/OEG20011025S0071
Hype has quickly been building around wireless video for the past few years. With 2.5G and 3G systems on the way, many have started to view the delivery of video content to mobile phones as one of the killer apps. The challenge, however, is making this work. Streaming video to a mobile phone places huge strains on the processing engine within these systems. Designers looking to add applications like MPEG-4 to a mobile product will face a tough power consumption vs. performance choice when building baseband solutions. One solution to this dilemma could emerge in the form of compression technologies. Sophisticated video compression techniques, such as MPEG-4, are needed to make wireless video happen. But, these compression techniques also bring their share of challenges. The article that follows will explore the benefits and challenges that compression brings to a mobile phone architecture. The truth about compression Sophisticate d video compression standards, like MPEG-4 and H263, make realtime video streaming on wireless handsets a reality - but not without some complexity. Video compression standards implement several computationally demanding techniques such as motion estimation (ME) between frames to encode temporal redundancy; the energy-compacting discrete cosine transform (DCT) algorithm, which encodes spatial redundancy; quantization; and entropy encoding. Although these compression techniques help fit video streams into the bandwidth available in wireless communication channels, they also raise a number of issues which affect the memory, computational capabilities, and internal data transfer channels of wireless communication devices. In turn, how these issues are resolved will have an affect on the cost-effectiveness of the device and the useful life of its battery. In addition, video compression techniques make a video image highly sensitive to errors in the bit stream. In fact, the wireless communication enviro nment is highly prone to various types of interference that can introduce errors into digital bit streams. Video compression algorithms remove much of the redundancy in video data, and, as a result, the effects of channel interference ripples through not just the current image being displayed, but also successive images. The predictive imaging techniques used in MPEG and other compression algorithms cause any errors in a reconstructed video frame to propagate through time into future frames. This can also cause the video decoder to lose synchronization. New solutions Newer compression standards like MPEG-4 have devised a number of techniques to compensate and overcome many of the errors encountered in a typical digital video bit stream. These error resilience tools enable detection, containment, and concealment of errors. These tools are: With MPEG-4, a new technique is used which places RM evenly throughout the bit stream. The data between two RMs is known as the video packet (VP) and it can correspond to approximately one row of the image. In this way, an error in a VP can only cause the loss of the bits in that one VP. An HEC is a single bit that is placed in every VP. When the HEC is se t to one, the header data is repeated for each VP, meaning that every VP could be decoded independently. As a result, an error in a header would not cause the loss of an entire image. In addition to these error resilience tools, MPEG-4 Annex E also provides some guidelines for error detection. For example, a semantic error is detected when more than 64 DCT coefficients are decoded for a block (a block is of size 8 x 8 pixels and hence has exactly 64 DCT coefficients) and when there is inconsistency in resynchronization header information (for example, the quantization parameter in a video packet header is out of range). Error concealment techniques The error resilient techniques described in the previous section help in isolating the transmission errors in the video bitstreams. Once the errors have been isolated, error concealment has to be adopted to estimate the macroblocks lost due to transmission errors. Error concealment techniques are ou tside of the scope of the MPEG-4 specification. They are more properly thought of as post-processing techniques that follow the activity of the video decoder. The combination of error resilience tools and various error concealment techniques are improving the quality of streaming video transmission over wireless communications channels. A highlight of some of the commonly adopted error concealment techniques is given below: Concealing these errors would mean that the MIPS load placed on the system's processor would rise and fall sharply as errors are encountered. Concealment techniques will also increase data transfers from memory because the erroneous macroblocks that are used to interpolate an image are stored in memory that is external to the processor. Migrating to 2.5/3G The vast majority of today's voi ce-only (2G) wireless communications devices were originally based on a dual-processor architecture. A digital signal processor (DSP) handled many of the communications tasks, such as modulating and demodulating the bit stream, coding and decoding to maintain the robustness of the communications link despite transmission bit errors, encrypting and decrypting for security, and compressing and decompressing the signal. The second processor was a general-purpose processor, which processed the user interface and the upper layers of the communication protocol stack. The basic dual-processor architecture of 2G will migrate to data-centric 2.5 and 3G devices, but it will require some significant enhancements to handle demanding multimedia applications like streaming video. As the computational and others capabilities of a wireless system increase drastically to meet the requirements of streaming video applications, a partitioning of tasks between the two processors becomes increasingly critical for several rea sons. System throughput is more efficient when tasks are assigned to the processor that is best suited to the task. But, just as important as system throughput, an effective partitioning of tasks will reduce power consumption and extend the system's battery life. The most effective way to reduce power consumption is to limit the number of processor cycles devoted to every task. If more processor cycles are needed for a particular task, power consumption increases. New 2.5 and 3G applications, such as streaming video and others, will change the nature of wireless communication devices. Designers of wireless platforms should be concerned about maintaining a high degree of flexibility as consumers will seek to download applications from the Internet onto their new handheld wireless systems. The handset, in a sense, will become an open application platform. It is incumbent upon designers to take this need for flexibility into account when designing next generation platforms. Meeting the processi ng requirements Figure 1 illustrates a proposed baseband architecture for a 2.5/3G mobile phone. The processing involved in streaming video applications can be divided into roughly two types of functions: control and transport (CT), which involves real-time streaming protocol (RTSP) session control and real-time transport protocol (RTP) media transport; and media decode (MD), which involves media decoding, error concealment, and other ancillary signal processing steps such as echo cancellation and others. The CT and MD functions have different processing requirements. CT is not computationally intense and mainly involves string parsing, data packet manipulation, and finite state machine implementation. An MCU is best suited for these types of tasks. The MD functionality is much more computationally intense because of the sophisticated signal processing required by audio and video coding algorithms. A high-performance, low-powe r DSP is better suited for MD functions. Figure 2 shows an efficient way to partition a streaming video application on a dual-processor platform. Note: Both RTSP and RTP are internet proposed standards. In Figure 2, RTSP and RTP are layered on TCP/UDP/IP. RTSP handles the description, setup, control, and tear down of streaming sessions. RTP manages the transport of media and provides sequencing information that is helpful in detecting packet losses. In addition, RTP supplies timestamps and payload identification information as well as a real-time control protocol (RTCP), which is used for QoS feedback and inter-media synchronization information. RTSP can be layered over both TCP and UDP, while RTP is almost always layered only over UDP. The data flow The data flow in a streaming video application is as follows: The streaming data enters the archit ecture by way of a 2.5 or 3G modem. The MCU will be running the protocol layers (RTP/RTSP and TCP/IP) and demultiplexing the audio and visual data. The audio and video compressed bitstreams are extracted from the respective RTP packets and are then forwarded to the DSP's internal RAM. The DSP then decodes the images for display. The DSP also stores a copy of the reconstructed frame for use in the decoding of the next frame and so on. In a video streaming application, previous images are used to extrapolate the current image. The previous image is moved macroblock by macroblock from the video buffer into the DSP's internal RAM where it is combined with other information and sent to the display screen as the current image. Because streaming video involves moving a tremendous amount of data in real-time, I/O issues are critical considerations. At least two direct memory access (DMA) channels, and possibly more, will be needed to avoid I/O bottlenecks, which would slow down the system and mitigate the effective computational speeds of the DSP and MCU. It is also important that specific DMA capabilities are included, which will simplify the movement of two-dimensional pixels, byte alignments, and byte-by-byte transfers. Managing memory As benign as a dual-processor architecture may seem, below the surface is a myriad of challenging design issues. One of the biggest challenges is memory management. Shared memory must be managed to avoid conflicts involving both processors accessing the same memory location at the same time. Memory access requests must also be ordered consecutively in time, while ensuring a predetermined access time for both processors. In addition, to make the most efficient use of the two processors, designers may choose to implement two OSes, because an OS well suited to a DSP will not function effectively on an MCU and vise versa. If the designer decides to implement two OSes, he or she must determine how to reconcile the differences between the Oses, as they will certainly handle memory addressing, memory accesses, and housekeeping chores differently. The structure and size of processor cache memory will also have a decided effect on system performance. For example, with a wireless communication device running the GSM protocol and MPEG-4 video encoding/decoding, simulations have been done to determine the optimum size of cache memory for maximizing cache hits and minimizing processor wait states. The following cache sizes were derived for the system's MCU: Research has shown that these sizes and types of memory would have a cache miss-to-hit ratio of just 3.4% for the instruction cache and only 9% for the data cache. When this type of simulation was performed for the DSP, it was found that an instruction cache of 16 KB organized as two-way associative with a 16-B line would result in an instruction miss-t o-hit ratio of less than 1%. Wireless video streaming The architecture described above can be used effectively to perform video decoding in a streaming video application. In a wireless streaming video application, one of the issues that most concerns designers involve the demands that are placed on the processor in terms of cycles. This relates back to power consumption, because reducing the number of processor cycles required for a task will reduce the power expended on that task. Designers must examine the aptitude of the DSP core to determine its suitability for video encoding/decoding. Some high-performance DSPs have reduced the processor cycles needed to perform inverse DCTs (IDCTs) and half pixel interpolations (HPI) on a macroblock from 1,200 and 350 cycles, respectively, for previous-generation DSPs to 147 and 70 cycles. The size of the display image also affects processor cycles. Using a high-performance, low-power DSP to display a streaming video application in the larger common intermediate format (CIF) at 45 frames per second takes at least 108 million DSP processor cycles per second. But if the smaller quarter CIF (QCIF) were used at 15 frames per second, the processor load would fall to 12 million cycles per second. For many wireless handheld devices the smaller QCIF format will be appropriate. Shifting to QCIF can then lower power consumption and lengthen battery life. In addition, newer DSPs consume less power no matter which image format is used. For example, a new DSP processing CIF video images at 45 frames per second will consume as little as 110 mW, while QCIF video images at 15 frames per second will consume only 12 mW. Vendors also have developed new instructions which further reduce the number of processor cycles needed for streaming video. Instead of the classical DSP instructions, these new instructions accomplish more of the video decoding task in real time. These instructions accelerate the video decoding process by a factor of two and reduce p ower consumption by requiring fewer processor cycles. Sebastien De Gregorio joined Texas Instruments (TI) in 1996 in the Wireless Communications Business Unit, where he was involved in DSP algorithms and advanced speech processing. Since 1999, he has served as audio/video project lead. He is based in Nice, France. He can be contacted at s-de-gregorio@ti.com. Madhukar Budagavi has been with TI since 1995 as a Member of Technical Staff in the DSP Solutions R&D Center. He works on MPEG-4 and wireless video communications. Prior to TI, he was first a software engineer and then a senior software engineer in Motorola India Electronics Ltd., developing DSP software and algorithms for the Motorola DSP chips. He can be contacted at madhukar@ti.com. Jamil Chaoui joined TI in 1995 and is a Member of the Technical Staff in the European Wireless Application Group of Texas Instruments. Prior to TI, h e was with Alcatel as a DSP software and system engineer in Alcatel's mobile phones group. He can be contacted at j-chaoui1@ti.com. References
System designers must be aware that error concealment techniques can have deleterious effects on performance unless the demands of these techniques are taken into account when the architecture of the system is developed. For example, concealment requires significant processing power. In addition, errors in wireless communication channels typically occur in bursts.
Related News
- intoPIX Showcases Groundbreaking Sensorand Video Compression Technologies at CES 2024
- intoPIX showcases the new lightweight video compression standards and technologies driving automotive innovation at AutoSens 2023
- intoPIX shows the new lightweight video compression standards and technologies driving automotive at CES 2023
- Successful Transmission of High-Quality, Ultra-Low-Latency Video over 60 GHz Wireless Communications System
- intoPIX demos newest compression technologies shaping the future of lossless quality wired/wireless transmission: 4K/8K JPEG XS, TICO-RAW and FlinQ - lowest latency & power, highest quality
Breaking News
- TSMC drives A16, 3D process technology
- Frontgrade Gaisler Unveils GR716B, a New Standard in Space-Grade Microcontrollers
- Blueshift Memory launches BlueFive processor, accelerating computation by up to 50 times and saving up to 65% energy
- Eliyan Ports Industry's Highest Performing PHY to Samsung Foundry SF4X Process Node, Achieving up to 40 Gbps Bandwidth at Unprecedented Power Levels with UCIe-Compliant Chiplet Interconnect Technology
- CXL Fabless Startup Panmnesia Secures Over $60M in Series A Funding, Aiming to Lead the CXL Switch Silicon Chip and CXL IP
Most Popular
- Cadence Unveils Arm-Based System Chiplet
- CXL Fabless Startup Panmnesia Secures Over $60M in Series A Funding, Aiming to Lead the CXL Switch Silicon Chip and CXL IP
- Esperanto Technologies and NEC Cooperate on Initiative to Advance Next Generation RISC-V Chips and Software Solutions for HPC
- Eliyan Ports Industry's Highest Performing PHY to Samsung Foundry SF4X Process Node, Achieving up to 40 Gbps Bandwidth at Unprecedented Power Levels with UCIe-Compliant Chiplet Interconnect Technology
- Arteris Selected by GigaDevice for Development in Next-Generation Automotive SoC With Enhanced FuSa Standards
E-mail This Article | Printer-Friendly Page |