Wireless home multimedia networks require multiple design strategies

Wireless home multimedia networks require multiple design strategies
By Noel Hurley, Consumer Entertainment Segment Manager, ARM Ltd., Maidenhead, UK, EE Times
December 2, 2002 (2:33 p.m. EST)
URL: http://www.eetimes.com/story/OEG20021127S0032

Implementing bandwidth-hungry and processing-intensive applications in a wireless network presents numerous challenges to design engineers. Until recently, real-time media and networking performance was achieved by using multi-chip solutions, with discrete processors to handle such things as MPEG2 decode or the DOCSYS cable modem. But today, to meet the requirement of newer standards such as MPEG4, WMV, and JVT (H26L) and maintain consumer level manufacturing costs, vendors are looking to ever larger-scale SoCs that contain several processors.

There are four technologies that when used together will optimize an SoC for maximum performance, power economy, and low cost in a wireless home multimedia network. These are the single-instruction, multiple-data (SIMD) processing; an efficient and fast on chip interconnect bus, such as the AMBA standard; effective use of symmetric multiprocessing (SMP); and reliance on vectored interrupt controller (VIC ) mechanisms to ensure real time deterministic response in a networked multimedia environment.

Presently, wireless device makers, service providers, and software developers use a variety of operating systems and applications based on disparate standards that can make it hard for devices and networks to interact.

There are a plethora of media standards: MPEG-2, MP3, and MPEG 4; Advanced Audio Coding (AAC); Bluetooth; H.263+, and the upcoming JVT for high-rate, high-resolution video; to name just a few, many of which have multiple profiles within their standard.

In addition, a new standards group, the Open Mobile Alliance, formed just this year, has replaced the WAP Forum, whose Wireless Application Protocol is the most widely used platform for Web browsers on cell phones. It is too early to know what the new group will come up with in the line of new versions of WAP (the latest version is WAP 2.0, launched in January of this year). At the same time, however, Apple Computer, Ericss on, and Sun Microsystems are working on what they call the Ericsson content-delivery solution for multimedia, based on 3rd Generation Partnership Project (3GPP) standards.

What all this means to the wireless network system developer is that any multimedia codec that is hardwired into the system is likely to be superceded by improved standards in the near future, meaning early product obsolescence.For this reason, hardwiring these codecs doesn't make any sense in today's market.

Single Instruction Multiple Data (SIMD) is a method of processing data in which a single instruction is applied to multiple pieces of data simultaneously rather than to each piece of data individually. Repetitive tasks are effectively consolidated into one, reducing the code size and greatly increasing the speed of data processing. Instructions of this nature are often associated with graphics and video.

SIMD instructions are especially helpful when processing streaming video, which lends itself well to paral lel processing. For example, if you are processing RGB components in video, whatever you do to one color you do to the two others. With SIMD you can do the procedure with one instruction for all three operations. This dramatically reduces the performance load on the codec processor. The use of SIMD in wireless multimedia environments is even more important than its use in desktop multimedia for it ensures efficient, fast processing with a minimum of compute and power resources.

An important element in any SoC developed for a networked multimedia environment is the use of an effective and fast on-chip bus architecture to move data around to the sharing processors with a minimum of delay. The AMBA bus architecture is emerging as the industry standard for such onchip chores because it allows all of the processors on a chip to communicate using a "mailbox" system to send each other messages.

Taking AMBA one step further in the direction of effectively processing multimedia in real time is th e AMBA Advanced High-Performance Bus (AHB) specification. It defines both the interface and the interconnect so that the maximum bandwidth from any given process technology can be utilized. While one of the interconnect elements of the AHB is a traditional shared bus with master and slave blocks, splitting the interface from the interconnect has significant implications for the interconnection of blocks on a chip. AMBA is now no longer purely a bus, but a hierarchy of interconnects with the interface block as the keystone.

The Multi-layer Advanced High-Performance Bus is an interconnection scheme based on the AHB protocol that allows for parallel access paths between multiple masters and slaves in a system. In situations where a system bottleneck is the result of limited bandwidth across the system bus, Multi-layer AHB multiplies the available bandwidth in proportion to the number of bus layers. Additional benefits arise from the reduction in bus transaction latency as a result of the increased bus capacity.

Using Multi-layer AHB, a wide variety of bus structures can be created. With a single layer the structure is identical to the conventional AHB bus structure. Full Multi-layer AHB consists of a bus layer for each of the bus masters, with each layer connected to every slave through the slave multiplexor. Typical systems are more likely to fit between these structures with slaves connected to a sub-set of the layers, or multiple bus masters on a single layer. Thus, multiple master processors can communicate regarding tasks, and they can even interrupt each other.

No idling

In the future, symmetric multi-processing could be extremely important for very large-scale SoCs in data intensive applications such as networked multimedia. In today's systems, each processor runs its own discrete code independently and there is no sharing of tasks and CPU processing time. This means that if one master has nothing to do, it sits idle even though another processor may h ave many tasks to be performed. And there is also considerable overhead in continually accessing memory and putting data into and out of it.

But when you add symmetric multiprocessing — processing of programs by multiple processors that share a common operating system, I/O bus, and memory — to the equation, then the AHB crossbar provides the architecture needed to allow exceptionally fast processing.

In standard SMP architectures there are two methods of coupling of processors and memory: a bus-based architecture and a crossbar. In a bus-based system, a single shared data path connects components. A crossbar interconnect is comprised of a series of single buses arranged to provide multiple paths between CPUs and memory. We recommend the crossbar architecture because, of course, multiple paths are faster than a single shared path. This is the architecture provided by the AMBA interconnect.

The greatest speed benefit, however, is that with SMP any program can utiliz e any resource on the system, including CPU processing. This means that tasks for a single program can be distributed among multiple processors, taking advantage of any processor that has idle cycles. Thus, every processor can be utilized 100 percent of the time, which not only means higher performance, but the higher efficiency also reduces silicon costs and results in significant power economy. Components can be matched precisely to the requirements of the system — without redundancy.

Critical to the success of networked multimedia and the hardware used to implement is the ability to support the real- time deterministic responses that standards such as MPEG 4 require.

Just as important as processor speed and throughput in this context is the ability of a system to switch context quickly and efficiently. Today's processors use interrupt service routines (ISRs) to process interrupts.

In an SoC environment optimized for multimedia processing the best solution is the use of a vectored interrupt controller (VIC) mechanism because it is the most effective and fastest mechanism in the context of networked multimedia. It does this by allowing the interrupting device to identify itself to the CPU by sending a special code over the I/O bus so that the source of the interrupt does not have to be looked up. This code is also associated with the starting address of the associated IRS for that device so that the processor can access the code without delay. Considerable time is saved.

VICs have been around a long time, and were used long before embedded real-time systems even existed, but they have turned out to be exceptionally valuable in speeding up the processing of interrupts and context switching in applications that depend on real-time performance. Using a VIC in combination with an AMBA AHB cross-bar setup, which also eliminates delays by providing multiple paths from slave processors to masters, is a powerful performance enhancer.

Interrupts are not the only cause of delay in the system; highly-pipelined processors can introduce excessive delays, or latency, into the system unless the proper techniques are used to avoid them. For example, execution of some instructions may be delayed because they depend on the results of previous instructions, but these delays can be avoided by use of forwarding. Also, performance can be lost when something happens to interrupt the smooth flow of instructions through the pipeline — for example, a branch instruction. But these delays can be avoided by using branch prediction to predict the flow of instructions.

These forwarding and prediction techniques maintain good pipeline efficiency by reducing pipeline 'stalls' — situations where the processor has to wait because its next instruction is still rippling through the pipeline.

See related chart