Meeting the Challenges of VoIP ATA Designs

Meeting the Challenges of VoIP ATA Designs
By Jeff Dionne, Arcturus Networks; and Brian Davis, Renesas Technology America, CommsDesign.com
Jun 8 2004 (10:00 AM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=21402084

Voice-over-IP (VoIP) has been a shining star in still a tough communication sector. Both cable operators and traditional telecom providers are offering or adding VoIP capabilities to their networks. And, going forward, they plan to increase their VoIP deployments to end users.

One of reasons that VoIP has become so successful is through the development of analog telephone adapter (ATA) boxes that allow consumers to tap VoIP services while still using their existing telephone sets. While a number of these systems have been deployed, there is a clear demand at the carrier level to build more ATA systems. The problem, however, is that these systems are extremely challenging. At the same time, they have to be cheap enough to be covered by a subsidy at retail or recovered over time from the customer's monthly bill.

These challenges require designers to re-evaluate the software and hardware design approaches they are taking during the development of an ATA. In this article, we'll look at some of the key hardware and software decisions that must be made. We'll also show potential technology choices that will allow designers to meet the performance and cost demands of today's operators.

Hardware Design Consideration
Consumer VoIP ATA hardware developers need to balance many factors to produce a cost-effective product that delivers toll-quality voice. Core processor selection is the first and perhaps most critical decision since this device will dictate the bulk of the bill of materials cost, controlling essential functions including external peripherals, non-volatile and run-time data storage.

Depending on a designer's familiarity with a particular processor, he or she may choose to use a familiar device and integrate an external digital signal processor (DSP) for voice processing operations such as compression and decompression and echo cancellation. Typically, a 32-bit RISC processor handles network and signaling protocols such as TCP/IP and the session initiation protocol (SIP). While it is possible to use a RISC-only processor to handle both voice and control functions, it would likely only support a single voice channel using a simple codec.

A dual-processor core implementation gives the engineer greater flexibility and the added benefit of using a single package. An example would be to use a processor with a RISC core and a DSP core. There are various low-cost devices available on the market and this type of device currently drives much of the existing consumer VoIP product landscape. A dual-core environment does require dual-processor software development, debugging and the management of inter-processor communication. This can add a development burden to the software front. Most semiconductor vendors have tool solutions to assist with this process.

Another option is to use a RISC/DSP processor with dual execution units but a single instruction pipeline. A RISC/DSP solution is capable of supporting both voice processing and network protocol functions without the need for inter-processor communication. As a result, inter-processor communication overhead between the RISC and the DSP is eliminated, which helps ensure good voice quality.

Most voice processing algorithms require the use of multiply-accumulate (MAC) and other mathematical operations in loops. A full-featured DSP with on-chip memory, single-cycle MAC instructions, zero-overhead loops, barrel shifters, and modulo addressing improves system performance significantly. An integrated RISC/DSP processor eliminates the need and cost of a separate DSP, simplifies the overall design complexity, and can reduce time-to-market.

Peripherals and Memory Matter
On-chip peripherals are also a key to design a cost-effective VoIP ATA product. For example, having on-chip dual Fast Ethernet MACs eliminates the need for external Ethernet controllers and ensures the product can scale to multi-port LAN/WAN configurations.

Most dual MAC devices include an integrated Ethernet bridge that can auto-forward Ethernet frames without CPU intervention, allowing it to focus on other tasks such as routing, in the case of a simple device without integrated router or firewall functionality. A processor running at 200 MIPS or so can also support network routing, network address translation (NAT) and firewall functions at near wire speed. Business class devices require strong cryptography and integrated hardware acceleration for symmetric ciphers. As a result, message digest will become more important over time.

A pulse coder modulation (PCM) interface with transmit and receive FIFOs is also required to enable the designer to seamlessly connect a variety of audio codecs as well as audio codec/SLIC combo devices for VoIP adapters.

Like the peripherals, the memory subsystem plays an important role in the VoIP ATA product design. This subsystem can be broken down into four components: instruction and data cache, DSP memory, flash, and SDRAM. Large on-chip instruction and data cache(s) enable the core processor to run at its full speed and a multiway set associative cache can improve the hit rate. Dedicated on-chip DSP memories help keep program coefficients and voice sample data on-chip maintaining processing throughput, while external memory including flash and SDRAM are required for program storage, code execution and run-time data storage.

A designer should look for devices that provide a glueless interface for external SDRAM and flash memory devices and keep in mind that processors with a 16-bit fixed-length instruction set have higher code density than those with a 32-bit or variable length instruction set. This is important to note since code density ultimately determines the external memory requirement and better density means cost-savings to OEM customers.

One of the more advanced technologies available to system designers is system-in-package (SiP)-based products. This is a sophisticated packaging technology that offers designers an option to stack SDRAM and flash memory with the processor die in a single BGA package. SiP technology offers various advantages not the least of which is a reduction in PCB form factor. A decrease in PCB complexity through the elimination of external memory components and buses improves PCB reliability and ensures the minimization of EMI and switching noise. The result is a cost-effective design, which mitigates any concern over memory component availability.

DSP Algorithms
In the ATA design, the DSP's primary task is to process speech codec algorithms. Compression of voice data is necessary to conserve network bandwidth utilization. For interoperability reasons a typical VoIP product supports a few common ITU codecs, such as G.711, G.723.1, G.726, and G.729A. These codecs offer trade-offs among bit-rate, implementation complexity, and voice quality.

For example a toll-quality codec, G.711 is a simple codec that uses less than 1 MIPS of DSP, but takes up 64 kbit/s of the network bandwidth, not including the overhead for RTP, UDP and IP headers. G.723.1 uses only 5.3 or 6.3 kbit/s of network bandwidth plus overhead and delivers near toll-quality voice, but consumes significantly more DSP resources (both MIPS and memory). G.726 supports multiple bit-rates (16-40 kbps), and G.729A supports 8 kbit/s. Both codecs deliver near toll-quality voice, and are not as demanding on DSP resources (Table 1).

Codec optimization to minimize DSP loading enables the VoIP product to support more voice channels without using a faster processor or adding another processor. Codec verification for bit accuracy is required to ensure high voice quality and compliance with the ITU standards. Designed with a multiple-stage pipeline, the RISC/DSP processor running at 200 MHz offering 260 MIPS performance can support three channels of voice stream using a high complexity codec along with a real-time operating system and networking protocols.

In addition to codecs, line and acoustic echo cancellation algorithms are needed to remove at least near end echo resulting from the unmatched SLIC hybrid impedance to inexpensive consumer-grade phones, room or handset feed through echo, and in some cases far end echo.

OEMs can obtain a license for most codecs from many software vendors for various platforms. Such license grants the OEM the right to use a specific implementation of a codec algorithm. However, these codec algorithms have patents owned by many patent holders. Providing a license and patent indemnification with broad coverage of major countries in North America, Asia, and Europe will give OEMs the peace of mind to sell VoIP products worldwide.

Software Considerations
All ATA devices require an operating system (OS). While rolling-your-own or using a traditional RTOS or simple task switcher is possible be prepared for a long development and testing cycle and an on-going commitment to maintain your code base. A standards compliant OS will allow the focus to be kept on product differentiation and your code reuse for the next generation of product will be significantly higher.

There are various OS options available, but keep in mind the key requirements for any consumer product: price and price. This usually eliminates any third-party proprietary OSes such as VxWorks and steers designers in the direction of Linux and other open-source embedded OSes. These OSes tend to give you complete, standards compliant networking stacks and broad microcontroller support in a reasonable memory footprint, but configuring and deploying them can sometimes be anything but off-the-shelf.

An excellent example of the benefit provided by an OS is to look at a relatively minor product change resulting from the addition of a second Ethernet port to an ATA. While in hardware this is a relatively minor design change, in software the intended use changes dramatically from an end-point to a router. Thus, implementing the functionality such as DHCP server, NAT, PAT, PPPoE, bridging, MAC or IP address cloning will require some form of QoS and a firewall stack. Here's where a mature OS really wins over rolling-your-own or proprietary OSes where each component represents added cost.

QoS and Latency
Perhaps the most discussed issue in consumer VoIP is quality of service (QoS). While there are more opinions on implementation strategies than consumer VoIP subscribers it's important to understand the limitations and intentions of QoS. Ultimately, VoIP service is susceptible to quality disruptions of various types largely attributable to delay jitter or packet loss during transmission over public networks. While this is not a concern for data traffic, the ATA must have an effective jitter buffer, otherwise it will not meet customers' toll quality expectations. QoS can only help with this, but it can't cure the problem.

QoS is meant to tag "high priority" packet traffic so that it will not be delayed or dropped due to congestion with lower-priority traffic. Ideally this can be accomplished between the ATA and broadband connection fairly simply by using a bandwidth reservation protocol such as RSVP. By requesting a predetermined amount of bandwidth from the broadband connection the audio stream encounters no bottleneck locally and quality is better guaranteed but only until it reaches the public Internet.

Another QoS-related consideration is latency associated with packet size. On a 128-kbit upstream connection of low-quality ADSL, a maximum length Ethernet frame of 1500 bytes can take almost 2 voice frame times. To address this, ATA devices with integrated NAT and router functions that are directly connected to a broadband modem can reduce the maximum segment size (MSS) for the duration of a VoIP call or predictably insert periodic breaks in outgoing packet stream to allow for the insertion of voice packets regardless of the data load.

In a similar scenario where the ATA is passing data traffic, terminating or initiating packets the device can employ traffic shaping. This involves bandwidth limiting particular types of packets and sending the time critical packets prior to those waiting to be sent. This allows the dynamic categorization of packets including minimum and maximum bandwidth and levels of priority for both incoming and outgoing data streams. If packets are lost, the TCP flow control mechanisms at each end will be triggered to reduce send rates.

Much of the Internet fabric will also respond to the "hints" embedded in the TCP packet header type of service (ToS) field. The packet classes of differentiated services (DiffServ) are used to indicate how the packet should be handled, and by honoring and generating this field, we gain some QoS. Routers will use DiffServ fields to place voice packets in higher priority queues, ensuring that they receive a higher proportion of the available bandwidth and experience less delay and loss.

Implementing traffic shaping can be accomplished in any number of ways. The hierarchical token bucket (HTB) theory of traffic shaping along with other methods helps classify (and modify) packets into various queues using almost any criteria. HTB uses the concepts of tokens and buckets along with the class-based system and filters to allow complex and granular control over traffic.

HTB allows the IP stack to easily and predictably manage the bandwidth that any queue uses. It allows for minimum and maximum bandwidth allocations. It also allows "lower" priority queues to temporarily borrow currently unused bandwidth from queues with higher priority.

One of the major advantages of HTB is that queues are organized in a "tree" where each classification inherits bandwidth restrictions from its parent node, thus allowing designers to control traffic in a very granular fashion. This is another advantage to using a mature OS, rather than custom or a proprietary RTOS.

Firewall and Traversal Techniques
NAT has enabled the Internet to grow beyond what would have been possible with IPv4 because of the limitations of the 32-bit address space. It is also the most challenging issue to address when trying to reach end user devices behind one or more NAT firewalls.

NATs come in four typical flavors: full-cone, address restricted cone, port restricted cone and symmetric NAT. There are several methods to traverse these NATs including: application layer gateways (ALGs), media tunnels, third party proxies, or simple transversal of UDP through NAT (STUN). Since providing an ALG, tunnel or third-party proxy requires the co-operation of the premises NAT device or additional equipment, it's highly impractical for a consumer level deployment and therefore as the ATA vendor, we are on our own to solve the NAT problem.

STUN is the most deployed option and will traverse most NAT firewalls. STUN works by using a lightweight UDP protocol and an external STUN server to identify the type of translation performed by NAT firewall(s). It will then identify specifically the exact translation the NAT has chosen to do on a particular UDP connection used for RTP or SIP. This information is gathered without the specific co-operation of the NAT firewall and is then used to establish the SIP and RTP sessions. While virtually all consumer premises equipment uses a flavor of cone NAT, in a corporate environment it is more likely to encounter symmetric NAT. In this case, an ALG or local proxy is unfortunately needed.

Provisioning and Management
There are two strategies employed by VoIP providers to address the configuration of VoIP ATAs: pre-configuration prior to shipping and auto provisioning typically using TFTP. It's important to note that much like custom signaling modifications made by carriers to the SIP standard, every service provider has its own unique provisioning model. Pre-configuration is impractical from a scalability stand-point so let's identify what is required to remotely provision a device.

Typically the configuration information would contain the SIP user ID, caller ID, password, subscribed features and any other account information including perhaps location information required by E-911. Server information would also need to be discovered, which includes a SIP proxy server, firmware upgrade server, media and feature servers. Parameters and variables will need to be defined including QoS tolerances, firmware revisions, ring types, timers and counters.

The most sensible implementation is to feed a backend management information base (MIB) database with parsed values from the incoming configuration file. This database can then hook the appropriate resources and apply the changes to the device.

Alternately, scripting can be used to accomplish much of the same result, but likely at portability and extensibility cost. Consumer products and carrier products have differing management requirements. While a carrier is going to want a device that integrates with their existing network management systems (NMS), a consumer requires web-based tools and has little use for SNMP. Again, this is where the database architecture excels, allowing multiple points of entry while retaining an organized structure.

Robustness and Upgrades
Since the consumer views the ATA as nothing more than an adapter, robustness is critical. To address this in higher-end embedded devices such as set-top boxes (STBs) designers can afford the luxury of memory resources capable of storing multiple images. In the event of failure these devices arbitrate between images and thereby mitigate failure risk. On a small embedded device data storage accounts for a large portion of the total BOM cost.

To marry these considerations, developers can build in the same robustness by segmenting flash into functional blocks that can each have redundant images, but not all at once, reducing the memory overhead of failover from 2x to some lesser factor. By segmenting in this fashion an extremely low-level arbiter in conjunction with digital signatures using PKI and a watchdog timer can identify and isolate corruption or unauthorized, possibly malicious new firmware instead of using the previous known good segment. The device can be designed to operate normally in all cases of a failed firmware upgrade, meeting the requirements of Cable Labs, for instance, while eliminating the cost of completely redundant flash memory. In the event of a catastrophic failure where only initialization code remains functional the device can be configured to failover into a second 'disaster recovery' mode where its low-level initialization code will attempt to seek a set of external images.

Segmenting flash has a second advantage more apparent during in-field upgrades. Since each segmented section of flash can be upgraded independently allowing an upgrade of required components only, added features or localization are possible without replacing the complete firmware. This conserves bandwidth and more importantly expedites the push upgrade of potentially hundreds of thousands of in-field devices.

Wrap Up
As we look forward, the next generation of ATAs will need to carefully address currently unresolved provisioning and security issues. Designers need to examine the hardware and software impacts of these requirements and ensure existing platforms provide the extensibility to future-proof for these considerations. Single function ATAs in the market today are likely not the long-term solution but they will provide the backbone by which the first wave of consumer VoIP will be rolled out over. As the market adopts the technology, we as the designers need to continue innovating the features, services and security that will ensure the successful long-term viability.

About the Authors
Jeff Dionne is CEO and chief architect of Arcturus Networks. He has over 15 years of experience in electrical engineering, hardware design and software. Jeff can be reached at jdionne@arcturus.com.

Brian Davis is director of the advanced solutions group at Renesas Technology America. Brian has extensive experience working with semiconductor and software solutions for personal computers, PDAs, smart phones, communication gateways, and other embedded systems.Brian can be reached at brian.davis@renesas.com.