Cutting Costs But Not Performance for VoIP-Enabled End-Points

David Brown & Michael Ward, Trinity Convergence
Apr 07, 2005 (3:25 AM)

Widespread adoption of Voice-over Internet Protocol (VoIP) will require the proliferation of VoIP end-points, including desktop IP phones, voice over WiFi (VoWiFi) handsets, analog terminal adapter (ATA) devices, and VoIP-enabled routers. While higher volume production will help to reduce cost, OEMs and ODMs are also looking for other ways to minimize product costs without sacrificing features or quality.

VoIP end-points are usually designed using a tandem processor architecture, with both a general purpose applications processor and a digital signal processor (DSP) (Figure 1) below. Typically, the DSP handles the computationally-intensive packet voice processing (voice encode/decode, tone generation and detection, echo cancellation, noise reduction, etc.), and the applications processor manages the user interface and VoIP call control protocol functions.

While this type of architecture provides a good "division of labor," it has a number of drawbacks when attempting to address the design requirements of high-volume, low-cost VoIP end-points. For instance, the need for both an applications processor and a DSP adds cost to the overall product. In addition, two discrete devices require a larger footprint and increase overall power consumption.

Silicon manufacturers have addressed space and power challenges through the development of system on chip (SoC) architectures in which the general purpose applications processor and the DSP are combined within the same physical device packaging (Figure 2). While this partially solves the physical size and power issues, it does not fully address the cost challenges, and, for some portable devices, an even smaller footprint and lower power is required.

A DSP-Free Approach
With continued improvements in the computing power and reductions in the size and cost of general purpose applications processors, there is now a viable alternative to the tandem architecture for VoIP-enabled products: moving the voice processing from the DSP to the applications processor. This "DSP-free" approach (Figure 3) below provides immediate cost, footprint, and power consumption savings by eliminating the DSP device.

Figure 1: Traditional VoIP "tandem architecture."

Figure 2: Traditional VoIP "tandem architecture" in SoC form.

In addition, since designers now only have to work with a single development environment and tool-chain, time-to-market gains can be realized through reduced design cycles and the ability to more rapidly debug the product.

The availability of highly-integrated applications processors that integrate the processor, and the peripherals needed for VoIP (such as an Ethernet MAC, an audio codec, a keypad input, and a LCD driver) enable the design of products with a DSP-free architecture that can achieve the highest level of cost, size, and power consumption savings.

Figure 3: VoIP "DSP-free" Architecture.

Applications Processor Considerations
Before choosing an applications processor for a DSP-free end-point design, issues such as clock speed, operating system support, power management capability, integrated peripherals and driver support, and media processing extensions need to be carefully evaluated.

Clock Speed
The processor must run fast enough to handle the worst-case VoIP scenario. This is different for each end-point design and relies heavily on the type of processing that the end-point must handle. For the VoIP component of the design, this is heavily influenced by the audio encoding and decoding standards that must be supported (more complex coders require more processing power), and the complexity of the user interface. In a multi-function device, sufficient processing power must be available to run the most complex scenario, such as a VoIP call and Internet browsing occurring simultaneously.

On the mainstream 32-bit core applications processors available today, each full duplex channel of VoIP will consume up to 100 MHz of processing power. This leaves ample cycles for executing complex applications on even entry level processors.

Operating System Support
Richly-featured operating systems, such as embedded Linux, are now available and gaining in popularity in a range of devices. With today's high-performance applications processors, there is no need to work with a primitive scheduler to develop a VoIP end-point because the development tools and capabilities of the operating systems can be leveraged to reduce development time and to open up access to third-party software.

With operating systems such as embedded Linux, a wide variety of application software is available to the product developer for integration into the end product. This allows quick development of multi-function devices (e.g. VoIP phone, MP3 player, etc.) with a much more capable user interface than what is typical in many end-points today.

Power Management Capability
To maximize battery life in a portable device, the VoIP application in an end-point should be built so that it can be placed in a low-power mode when there are no active calls. As a result, software must be written so that an incoming call request (using SIP or H.323) or keypad press will bring the device quickly out of "snooze mode." Techniques such as disabling audio peripherals and reducing clock speed can further enhance battery life when the device is in snooze mode.

Many embedded operating systems provide support for low-power operation, and this is rapidly becoming a necessity as more end-points become portable.

Integrated Peripherals and Driver Support
Applications processors with suitable on chip peripherals enhance cost savings and simplify VoIP end-point design. On-chip Ethernet MACs and audio codecs remove the need for external discrete devices, which reduces chip count and power consumption. Further, by adding support for keypad input and display, an end-point can be easily constructed using a highly-integrated applications processor.

Audio interfaces for VoIP must be capable of driving data in small blocks to minimize latency and delay through the system. Many audio drivers written for entertainment applications, such as music players, buffer data to ensure a smooth stream; this approach cannot be used for VoIP.

Media Processing Extensions
VoIP media processing on applications processors can be further optimized by using processors that support media processing extensions. The approaches used generally fall into two categories: instruction set extensions and accelerators.

VoIP processing relies heavily on the numerically-intensive operations that DSPs are optimized to execute. Current generation applications processors that support key DSP instructions provide the required media processing. However, migrating this processing to an applications processor still requires careful design and experience with the processor architecture.

For example, the ARM926EJ-S core by ARM Ltd. features instruction set extensions that are highly flexible and can be used to optimize virtually any type of media processing. Using the 'E' extensions of the ARM926EJ-S reduces the bandwidth required to execute a typical voice encoder by as much as 20% (when compared to its execution on an ARM9 series core without the 'E' extensions). This savings in bandwidth translates to either a lower clock speed and increased battery life, or additional processing power for the remainder of the application. To put this in context, a typical G.729AB codec optimized with the DSP extension instructions can save approximately 5 MHz over a well-optimized implementation for an architecture without these instructions. However, this gain is magnified when implementing a more complex, wide-band codec such as G.722.2, where approximately 20 MHz can be saved through a DSP extension based implementation.

Similar optimizations are available for designs targeted to MIPS Technology's MIPS-32 and MIPS-64 cores with DSP Application Specific Extension (ASE). The MIPS ASE instructions add powerful DSP instructions which can reduce the bandwidth required to execute key mathematical operations (common in many voice codecs) by as much at 50%, leading to a highly-efficient implementation for VoIP. The MIPS-32 DSP ASEs add less than 6% to the die area for a typical device, making this a very efficient acceleration from a size and power consumption perspective.

Accelerators are discrete processing blocks specifically designed to execute media processing. Data is transferred from the applications processor to the accelerator block where all or part of the media handling task is carried out. Accelerators tend to be optimized for a single function (e.g. a particular voice encoder) and have the advantage that they usually execute in parallel with the main processing on the applications processor core. The additional processing power made available by a media accelerator enables highly bandwidth-intensive operations such as wide-band audio processing, echo cancellation, or video processing to be executed on a low-cost, low-power applications processor. These accelerators differ from SoC architectures that combine an applications processor with a DSP in that these accelerators are discrete processing blocks, controlled through the programming of the applications processor. Tasks such as complex video codecs are good candidates for an acceleration block within the applic ations processor architecture.

Migration of Software from a DSP to a DSP-free Environment
The voice codecs and other media processing used by VoIP have traditionally been designed for efficient execution on a DSP. Consequently, migrating these compute-intensive operations to an applications processor requires care and expertise. For example, most voice coders used in VoIP today will not even run in real time on a 100-MHz applications processor if the raw 'C' model for the vocoder is simply compiled and executed. In addition, voice processing in a tandem architecture device is physically separated from the management and user applications. When the control and media processing functions are merged onto a single device, care must be taken to control the overall system priorities to address the real-time nature of the voice processing.

Efficient coding with an understanding of how to squeeze the best from an applications processor's individual architecture is essential if a complex task such as VoIP is to run in real time. All elements of a processor's architecture from best-use of the pipeline to elimination of unnecessary instructions and parallelism must be considered to create useable VoIP software. Although code generation tools for applications processors have advanced significantly in the last five years, it is still necessary to write assembly code to create an ultra-efficient implementation. Coding in assembly language is a slow and specialized task, but this is the only currently available way to implement VoIP media processing elements on an applications processor.

Implementing the control part of VoIP (e.g. the SIP or H.323 call control stack and user interface) does not require the same level of hand-crafted optimization, and a 'C' implementation of these elements is acceptable. By exposing a suitable 'C' Application Programmers Interface (API) to allow control over the media processing elements in VoIP, development can be partitioned into the creation of the real-time VoIP components and the user application (which creates the majority of the user experience in each end-point). This split allows developers well versed in the creation of the user experience to focus on this part of the end-point, and further allows the complex VoIP media processing software to be re-used in a range of implementations.

Limitations of a DSP-Free Approach for VoIP Applications
The advantages of a DSP-free architecture for VoIP-enabled end-points are very appealing; however, at this time, this architecture should only be used for products supporting between one and four channels of VoIP. This is a direct result of the compute power available in the current generation of applications processors that are cost-effective for end-point devices, and the subsequent performance of the voice processing algorithms that are run on these processors.

When looking at the requirements for most VoIP end-points, however, this is not a significant limitation. Most IP phones will support only one or perhaps two channels of VoIP simultaneously. ATAs and voice-enabled residential gateways typically have only one or two POTS interfaces, with a few models supporting four POTS lines or simultaneous calls (for residential applications).

The specific number of VoIP channels that may be supported by a given processor is very dependent on the type of VoIP channel that is to be supported. Voice codecs require a range from single-digit MHz for G.711 to closer to 100 MHz for a wideband codec like G.722.2. Other features such as acoustic echo cancellation also require processing power. When these various components are added together, it is possible for a VoIP channel to require a range of 25 to 150 MHz.

It is possible to offload some of the voice or media processing to other components within the system. For ATAs or voice-enabled residential gateways that connect traditional telephones to a VoIP network, there is a need for line echo cancellation. This can be performed within the media processing software on the applications processor. Alternatively, there are Subscriber Line Interface Circuits (SLICs) devices that can provide the echo cancellation in addition to terminating the analog telephone line. Component choices such as this can allow the system designer to optimize the system for the best possible price/performance target.

Full-motion video-over-IP support is another challenging application. While low-frame rate solutions can be done in software, the encoder processing requirements of video codecs such as H.263 or MPEG-4 exceed that which can be performed along with a suitable voice channel plus other application functions on today's applications processors. With the addition of task-specific video acceleration IP blocks, which can be readily integrated into the applications processor design, a video-enabled solution can be implemented without an external DSP.

The Future of DSP-Free VoIP
DSP-Free techniques for the development of VoIP end-points are proven and being used by numerous OEMs and ODMs in shipping products today. Feature-rich products with sophisticated GUIs, video telephony services, and complex voice-channels can be cost-effectively developed and deployed using available applications processors. As faster, smaller, and cheaper applications processors become available, even more sophisticated services will be supported, building upon the foundation of VoIP functionality offered in devices today. These services include increased channel counts for voice-enabled residential gateways, higher fidelity audio with higher levels of compression than exists today, and even the ability to move full-motion video fully onto the applications processor without the need for an acceleration IP block.

About the Authors
David Brown is co-founder and CTO of Trinity Convergence where he sets the technical direction of the VeriCall and VeriCall Edge product architectures as well as manages Trinity's Cambridge-based Research & Development team. He can be reached at dbrown@trinityconvergence.com.

Michael Ward is director of product line management at Trinity Convergence. He is responsible for the direction of the company's VeriCall and VeriCall Edge embedded software products for VoIP- and V2IP-enabled end-points. Michael can be reached at mward@trinityconvergence.com..

Click here to read more ...