Multi-Ports Eliminate Baseband Processing Bottlenecks
Multi-Ports Eliminate Baseband Processing Bottlenecks
Casey Springer, IDT
Sep 02, 2004 (4:00 AM)
URL: http://www.commsdesign.com/showArticle.jhtml?articleID=30000347
The rapid buildout of a new generation of high-bandwidth wireless networks has opened up opportunities for handset designers to introduce a wide variety of exciting new multimedia functions. But many of these new functions bring robust processing requirements that outpace the capabilities of traditional handset baseband processors. While the widely deployed ARM7, ARM9, and ARM11 processor cores as well as digital signal processor (DSP) cores used in many handset designs have been able to support the processing requirements for basic voice, MPEG-4 and embedded camera applications, the daunting processing requirements of emerging multimedia functions such as continuous speech recognition, next-generation MPEG-4 (H.264), and Mobile 3D are quickly exceeding these processors' capabilities (Figure 1 below). To meet these rising processing demands, a growing number of handset designers have turned to dual-processor architectures. Working in parallel, these multi-processor systems provide the compute power required for today's growing list of high-performance handset applications. The addition of a separate applications processor helps offload much of the processing associated with these new multimedia functions and enables the use of a full-featured operating system. Moreover, a dual-processor approach offers designers a highly modular architecture. By splitting the baseband and applications functions, this strategy allows designers to use a single design as the basis for multiple product models, each supporting different wireless standards and feature sets. For example, a fixed application block can be used with a variety of baseband processors to allow an OEM to quickly target handsets in different geographic regions and market segments using a limited amount of design resources. While these dual-processor architectures offer multiple advantages, their ability to meet the rapidly escalating processing requirements of these new applications, and to allow users to take full advantage of the rising bandwidths of wireless networks, rests largely on the ability of the handset to rapidly move data from one processor to the other. The design of the inter-processor communications architecture presents a potential performance bottleneck. This article will look at some of the traditional design approaches used by handset developers and examine how a new generation of multi-port memory devices offer an attractive new design option for high-end wireless handsets.
Inter-Processor Communications: What's Needed
Any inter-processor communications architecture that handset designers employ must, first and foremost, meet all their traditional design requirements. To extend the battery life of the handset, an inter-processor communications scheme must support low operating voltages and consume minimal current. It must also enhance design flexibility by supporting a wide variety of I/O voltages and by supplying an interface that is compatible with most major vendors' applications and baseband processors. Finally, to meet the grueling footprint requirements implicit in modern handset design, any device used to facilitate inter-processor communications must conserve precious board real estate by using extremely compact packaging.
Ideally, an inter-processor communications scheme will also meet a few additional design criteria. It will support extremely high-speed inter-processor communications rates to meet the growing bandwidth requirements of 3G wireless technologies. It will supply innovative functionality to help reduce design complexity.
Finally, by supporting the use of highly modular, reusable designs, it will help shorten the development cycle and shrink time to market. For example, designers using the right inter-processor communication scheme use the same basic handset design to support GSM, wideband CDMA (W-CDMA), and cdma2000 networks by simply swapping out the baseband processor and reusing the same applications suite. This approach allows design teams to target products in a variety of geographic regions and market segments while using minimal design resources.
Traditional Approaches
A variety of options are available to designers crafting inter-processor communications schemes for multi-processor handsets. Many of the early dual-processor architectures have employed embedded interfaces to link the baseband and applications processors. Embedded UARTs, commonly found on virtually all processors, offer a highly compact footprint and consume little system power. Moreover, this approach minimizes cost by requiring no additional logic.
Similarly, I2C interfaces provide a widely available, standard interface for inter-processor communications. And unlike a UART-based approach, the I2C interface can support the interconnection of more than two devices. That additional functionality can turn out to be a key advantage in markets where designers must incorporate processors for both cellular and non-cellular wireless technologies such as wireless LAN (WLAN) and Ethernet.
More recently, many designers have opted to use USB to interconnect the baseband and applications processors in a dual-processor architecture. However, one issue to consider when implementing a USB-based solution for inter-processor communications is software driver support. Many baseband processors do not support software drivers that allow designers to take full advantage of the maximum data rates defined by the USB standard.
When using any USB-based implementation it is also important to ensure that the USB port in one of the two processors supports a host configuration. Because of its heritage as a PC peripheral bus, the USB specification defines a host/peripheral relationship where one device acts as the host and the other as a slave. If neither processor supports the USB host specification, the development team must expend additional time and effort adding another component to the design. Ultimately, that additional component impacts the end product footprint, power consumption and cost.
Performance Limitations
While each of the above design options offer a standardized approach to inter-processor communications that is relatively simple to implement, consumes little power, and helps minimize design footprint, they each present a major liability. Their performance limitations pose a significant bottleneck, especially as designers move to higher bandwidth 3G air interfaces. None of these established design strategies can support the bandwidths required to keep pace with newly emerging wireless technologies and the applications they enable.
With a maximum performance of 230 kbit/s, a UART-based architecture can support simple text messaging applications and basic 2.5G wireless interfaces such as GPRS. But a UART interface cannot deliver the bandwidth needed to support commonly available cdma2000 1x networks and EDGE implementations running in excess of 300 kbit/s (Figure 2).
The fast mode of the I2C interface, commonly found on most baseband and applications processors, is capable of supporting a maximum data rate as high as 400 kbit/s. However, by the time a designer factors in overhead stemming from control command requirements, any I2C-based design will approach its maximum data rate to support an EDGE implementation running up to 384 kbit/s. At 2 Mbit/s and beyond, newer cdma2000 3x, W-CDMA and cdma2000 1xEV-DO networks run well beyond the capabilities of I2C.
Version 1.1 of the USB interface promises significantly higher performance levels in the 1.5 Mbit/s range. But once a designer compensates for overhead and handshaking, the interface reaches its peak at about 1 Mbit/s. Again, any 3G network using an air interface with capabilities in excess of 2 Mbit/s will face a significant performance bottleneck. While all of these embedded processor interfaces are rapidly evolving to higher levels of performance, few baseband or applications processors currently support newer implementations of those interfaces.
One other option for a development team is to implement a multi-processor communications scheme using a proprietary ASIC-based design. But the extremely high costs associated with designing and fabricating a custom device in today's deep submicron process technologies run well into the millions of dollars. Only manufacturers developing extremely high-volume products can afford to design their own custom solution.
High Performance and Off-The-Shelf
Ideally, development teams designing 3G handsets need a solution that can be built from off-the-shelf parts and is capable of supporting the 2- to 10-Mbit/s data rates found in next-generation 3G wireless networks and other non-cellular applications. One design approach, which meets those criteria, is the use of multi-port devices to implement an interprocessor communications scheme.
Multi-port ICs integrate memory and control logic in a single die to enable simultaneous access to a common central memory. Each port supplies separate control, address and I/O pins that permit independent asynchronous access for reads and writes to any location in memory (Figure 3).
Traditionally, multi-port memory devices have not been available in low-power versions. But recently, IC vendors have brought to market a new generation of low-power 1.8-V devices that meet the stringent power requirements of the handset market. With access times as fast as 55 ns, this new generation of multi-port devices support data rates of 290 Mbps, well in excess of the performance requirements of current 3G wireless standards such as the 2.4-Mbit/s cdma2000 1xEv-DO spec. Just as importantly, a multi-port-based solution provides sufficient headroom to support future transitions to higher performance 4.8-Mbit/s cdma2000 1xEV-DV, 10-Mbit/s high-speed downlink packet access (HSPDA), and the introduction of non-cellular wireless technologies such as 802.11b Ethernet networks running in the 11-Mbit/s range.
Memory Partitioning
One of the more attractive advantages of using multi-port memories to support inter-processor communications is the flexibility it gives designers to optimize their design for different applications. One way multi-ports provide this flexibility is by allowing the designer to partition memory into transmit and receive blocks to eliminate the need for collision detection and arbitration and, in the process, maximize performance. The size of the memory partitions can be designed in software and can be changed dynamically within the system as requirements change.
That capability can come in very useful when, for example, a user is downloading an unusually large file. Rather than endure the latency associated with a typical 50/50 percent partition between transmit and receive blocks, the system can automatically allocate a larger portion of the memory to buffering as the data is being downloaded. Interrupt functionality integrated into multi-port memories allow each processor in a two-processor handset to send a software flag indicating to the other processor when it needs more memory. The processor can then re-segment the multi-port memory with a larger receive block prohibiting access to that address space by the other processor.
Once the memory requirements are fulfilled and the data is successfully downloaded, the initiating processor can send a new flag indicating those restricted memory segments are again free for use (Figure 4).
If the designer selects the interrupt function, a memory mailbox location or message center is automatically assigned to each port in the two highest memory locations (see Figure 3 above). The contents of the mailboxes are user-defined, 16-bit messages that instruct the interrupted device what to do.
When the left port writes to the right port interrupt location, the right port interrupt is set. The right port interrupt is cleared when the right port accesses the right port interrupt location.
Similarly, when the left port accesses the left port interrupt location the left port interrupt is cleared. When the right port writes to the left port interrupt location, the left port interrupt is set. If the interrupt function is not used, the two address locations are used as part of the random access memory instead of as mailboxes.
This same memory segmentation capability is also available in tri-port memory devices. With the same interrupt functionality available in multi-port ICs, designers using a tri-port device can dynamically allocate memory addresses between an applications processor, a baseband processor, and a wide range of processors now employed in handsets to support non-cellular data connections such as the Wi-Fi and DMB standards. Moreover, one of these new devices supplies additional flexibility by supporting 3.3-, 3-, 2.5-, and 1.8-V I/O voltages on the three ports with a 1.8-V core (Figure 5).
It is important to note, however, that these new tri-port devices are essentially two low-power, dual-port devices packaged together. So, unlike a true tri-port memory, these devices essentially supply two independent bi-directional data trunks rather than three. With this configuration, two broadband processors cannot share data, rather they only share data between themselves and the applications processor.
Simplifying Modularity
Modularity is another way that multi-port memories for inter-processor communications increases design flexibility. Since processors in a multi-port memory-based design interface through an industry standard SRAM interface, designers can easily swap out components and modules to meet requirements for different applications or evolving wireless technologies. This not only increases design flexibility, it also helps reduce end product cost by allowing developers to use the same basic design for multiple markets.
Moreover, since current generation multi-port memories support different operating voltages in pin-compatible packages, designers can now migrate their product line to lower operating voltages and, in the process, extend battery life by simply swapping the next-generation baseband processor and multi-port memories.
This high degree of modularity becomes extremely important as OEMs migrate to new, higher bandwidth wireless technologies. As an example, many OEMs today use a dual processor design to build handsets for cdma2000 1x networks. By simply swapping out the baseband processor and multi-port devices assuming both are available in packages that are pin-compatible to the original sockets and the system software remains the same the product development team can bring to market a new handset for higher bandwidth cdma2000 1x EV-DO networks. In the process they can preserve all other aspects of their design, minimize their design effort, and dramatically shorten their design cycle. Alternately, this same approach can be used to employ a single handset to support varying wireless requirements in different markets.
Extending Handset Functionality
As handset and portable device designers add functionality to differentiate their products from competitive offerings, one of the challenges they often face is increasing contention for the limited number of general purpose I/O (GPIO) pins on each processor. Many current generation handsets, for example, must now devote GPIO pins for functions such as backlights, fashion lighting or LEDs to indicate a battery is charging. As each new cell phone extends its feature set, many processors are running out of GPIO resources.
Some of the latest generation multi-port memories address this problem by integrating input read and output drive registers that allow the designer to monitor and drive external binary input and output devices, such as LEDs or DIP switches, using just the standard memory interface of the dual-port. With this additional capability, designers can extend the capabilities of the handset's processors by allowing their limited number of GPIO pins to be used for other purposes (Figure 6).
With one recently introduced dual-port device, for instance, the input read register (IRR) captures the status of two external binary input devices, such as DIP switches, connected to the input read pins. This permits either processor in a two-processor handset design to monitor the status of two external devices by simply using the pins already used to interface with the dual-port memory. Once access to the IRR is enabled, the contents of the IRR is read from the dual-port memory as a standard memory access to address x0000 from either port and the data is output via the standard I/Os.
The output drive register (ODR) on the device can be used to monitor the state of up to five external binary-state devices by providing a path to Vss for the external circuit. With this capability, a processor can control up to five external devices without using any additional control signals. Those five external devices can operate at different voltages as long as the combined current of the devices does not exceed 40 mA (8 mA Imax per device).
Once access to the ODR is enabled, standard write accesses to the dual-port memory from either port to address x0001 are used to set the status of the ODR bits. A 1 indicates on and a 0 indicates off. The status of the ODR bits can also be read without changing the status of the bits via a standard read to address x0001.
Wrap Up
As the higher bandwidth 3G infrastructure rolls out, handset designers will face increasing pressure to deliver products capable of taking full advantage of these new wireless technologies. While designers can employ a variety of different strategies to build multi-processor handsets, architectures built around multi-port devices offer the only low power, off-the-shelf solution capable of meeting the aggressive performance requirements of both current and next-generation wireless technologies.
About the Author
Casey Springer is a product manager with IDT's multi-port products group. Prior to joining IDT, Casey studied solid state physics at Santa Clara University. He can be reached at casey.springer@idt.com