The combination of fast-evolving wireless technology and the exploding market for consumer wireless applications has created demands on existing broadband networking equipment that conventional processors are ill-equipped to handle. Many of the processors used today are built around conventional RISC architectures, which evolved to meet standards and applications for personal computers, workstations and servers but which are suboptimal for networking. In addition, benchmarking for processors used in consumer wireless applications has been based on large packets typical of e-mail and file transfers that have not required real-time performance. Processors that perform very well under those benchmarks are apt to perform very differently in consumer wireless applications, where a mix of packet sizes is the norm and where real-time performance is a necessity. To support the widely varying standards, interfaces and applications required by the consumer wireless market, the adaptation of existing architectures can only go so far. A whole new approach is required to build high-performance systems that are flexible, adaptable to new standards and applications, and economical to build. Combining several key design strategies in a new chip configuration can accomplish these goals: direct-memory (that is, memory-to-memory) instructions, which enable operations to be completed in fewer clock cycles; deterministic hardware multithreading, whereby instructions from multiple threads are mixed in a single pipeline for instantaneous context switching to hide pipeline hazards; and an OS that is targeted at embedded networking and capable of implementing I/O protocols in software. By removing hardware I/Os and eliminating on-chip registers and buffers, there is room to integrate both program memory and data memory on-chip. No off-chip SDRAM is required in many cases, and for more demanding applications an inexpensive and small memory can be added. A deterministic multithreaded architecture allows each thread to operate independently. Also, each thread is fully programmable, so that systems can be upgraded in the field or adapted for new products as applications emerge. At the beginning of every instruction cycle, the processor determines which thread to execute; thus, it is possible to interleave tasks on a per-instruction basis. Each task has its own register set, memory and context on-chip so that it can execute immediately. Fast context switching is also crucial for the flexibility needed in wireless networking applications. This architecture is capable of zero-cycle context switching and interrupt handling. A memory-to-memory architecture lets the processor operate on packets directly in on-chip memory, eliminating the need for caches and buffers. Thus the processor can move inputs and outputs directly from memory to I/O in real-time. By integrating 320 kbytes of SRAM on the chip for storing instruction and packet data, memory can be accessed in two cycles, compared with the 20 to 50 cycles it typically takes to access external RAM. A single instruction can read, modify and write data in one cycle. Packet-processing operations need only access data once, so this configuration saves cycles while reducing overall code size. The memory-to-memory architecture eliminates load/store instructions that are found in general-purpose processors. An instruction set that is designed specifically for wireless networking can use just 41 instructions to eliminate much of the overhead of traditional approaches, reducing memory requirements by up to 95 percent. To achieve flexibility, hardware must not be tied to a specific set of interfaces and feature sets that may change by the time the system gets to market. This can be accomplished by moving many functions, including I/O, out of silicon into software. As a result, software can be easily upgraded to support evolving standards and add features. Implementing I/O and communications protocols in software instead of hardwired logic frees die area for on-chip performance-boosting memory and yields a processing platform applicable to many market segments. More important, it assures that the system will not become obsolete Most operating systems for embedded systems have been designed to be general-purpose, with the expectation that they will be used to run on processors and systems-on-chip that have the requirements for a particular application built into the chip. But because they are designed as one size fits all, they include many functions that might not be needed for a particular application. The code for these unused functions resides in memory even when it is not used, thus requiring larger flash memories and DRAMs. This is wasteful in terms of system costs; it is also inefficient because bloated code places a burden on the processor that slows performance. An extremely important requirement for assuring the highest possible performance in wireless applications is an operating system that is mated to the processor and can take advantage of all the processor's features. As a result, it has nothing that isn't needed, and the code size is kept to a minimum. Keith Morris(keith.morris@ubicom.com)is the director of marketing for Ubicom Inc. (Mountain View, Calif.). See related chart |