Embedded MPU vendors step out of the box

Embedded MPU vendors step out of the box
By Anthony Cataldo, EE Times
June 14, 2001 (4:09 p.m. EST)
URL: http://www.eetimes.com/story/OEG20010614S0041

SAN JOSE, Calif. — Embedded-processor vendors are looking to buttress their designs with a wide range of features outside of the core MPU itself, a trend that may soon make processor-only designs passé.

Many of the presenters who described their next-generation processors at the Embedded Processor Forum here this week weren't just talking about the latest spin on superscalar design and pumped-up operating frequencies, but also things like chip-to-chip buses, crossbar switches, coprocessors and configurable caches.

The message was clear: The value of embedded processors more and more depends on larger system-on-chip development platforms. Moreover, these platforms must be malleable, so that they are useful for systems designs ranging from automotive engine control to set-top boxes to networking equipment.

"The name of the game is whether this helps the end application," said Kees Vissers of Trimedia Technologies Inc., which disclose d details of its latest 205-MHz, five-issue VLIW processor at the forum.

Joseph Chang, PowerPC engineering manager for Motorola Inc., admitted that MPU designers like himself are having to make a mental leap.

"When I look at a core, I think of the MPU design," he said. "However, when you look at system-on-chip, that doesn't make sense anymore. You have to look beyond the core in order to get the maximum benefit."

When the system-on-chip craze took the chip industry by storm several years ago, embedded-processor vendors responded by coming out with RTL-level synthesizable cores to ease integration. But the new line of thinking suggests that creating a piece of intellectual property doesn't go far enough. MPU vendors say they need to give customers the system-level elements needed to craft highly integrated designs if they want their MPUs to be accepted.

One way this is being done is by opening up new I/O channels, both processor-to-processor and processor-to-peripherals. Multilevel bus structu res, crossbar switches and cache-coherent buses are some of the ideas being floated.

Sun Microsystems Inc., purveyor of the Sparc RISC processor, for one, proposed introducing a CPU-to-CPU bus that would provide high bandwidth with low latency, and ease chip integration.

The JBus was described as a synchronous, cache-coherent bus designed to transfer data in 128-bit packets at speeds ranging from 16 and 64 Gbytes/second. Sun's aim is to pave the way for smaller, two-way and four-way servers, which today are based on symmetrical multiprocessing, to move from standalone CPUs to devices where two CPUs would reside on one chip, and as many as four with the help of a repeater.

"When you go on-chip, you have much higher bandwidth," said Renu Raman, director of engineering for Sun's processor products group. "Having a shared-bus topology simplifies the design so that the bus is no longer is the limiter — it's the logic. There may be much more latency in each CPU element because of the respective qu euing structures that they have."

Raman demurred when later asked about Sun's specific plans for the JBus, but he hinted the company may try to offer it as a standard. "We want to expose it and enable others to plug into it," he said.

PMC-Sierra Inc., for its part, raised eyebrows here with a processor that combines two 1-GHz RM9000 CPUs, connected via a switch that enables 8 Gbytes/s of bandwidth between them. Key to the design is a cache-coherency feature that allows the sharing of modified data between CPUs, minimizing the number of slower accesses to main memory.

Still, some questioned the cost of implementing a coherent bus. "Sometimes you can't afford to have a shared bus and can't even afford to have coherency because of the power," said Sophie Wilson, chief architect at Broadcom Corp.'s DSL business unit (Cambridge, England).

Others too are making moves to support multiple processor cores on one chip. Motorola said it will offer symmetric multiprocessing using the MESI protocol for it s 32-bit e500 PowerPC processor. At the system-on-chip level, the company is proposing an on-chip crossbar switch to link other elements, such as memory controllers, DMA engines and external I/O.

Also looking to address multi-MPU processing, Tensilica Inc. announced a 128-bit, 200-MHz processor-to-processor I/O for its Xtensa reconfigurable processor.

Infineon Technologies Inc. said it is weighing chip-to-chip connection options for its latest Tricore 2 processor, which was redesigned to run at a faster 600 MHz. Right now the company is using 64-bit crossbar switches to link the processor to things like external memory, the system bus and coprocessors. But crossbar switches may not do for CPU-to-CPU connections.

"It's very true that it won't scale. We don't see it handling more than five or six items on a crossbar," said Robert Ober, director of architecture for Infineon.

In a nod to the new breed of reconfigurable-processor vendors, some of the more-established processor makers tipped plans to give designers more latitude for customization. Motorola said it will offer variable Level 1 caches, and options for a Level 2 cache and memory-management unit (MMU). MIPS Technologies Inc. said it is offering an optional cache controller, MMU, clock-gating and design-for-test options for the 5Kf, and has a graphical user interface for designers to make their modifications.

Several companies are also trying to make it simpler to support coprocessors. Motorola's e500, Infineon's Tricore 2, Trimedia's TM32, NEC's 64-bit Vr5500 and MIPS' latest 5Kf and 4Ke 64- and 32-bit processors include interfaces to user-defined coprocessors. Motorola is going a step further by developing some of these coprocessors itself.

The company, which calls the coprocessors "application-processing units," is targeting areas like networking and automotive. Some of the first include a signal-processing unit for complex numeric calculations and a context manager for more deterministic interrupt responses.

"We see that t he trend is to create particular products aimed at narrow market segments," said Motorola's Chang.

One coprocessor that is making its way into more embedded-processor designs is the floating-point unit. A fixture of desktop processors since the '80s, floating point has been considered overkill and too expensive to implement in embedded processors, which are more sensitive to cost and power consumption, observers said.

That appears to be changing as the cost of implementing FPUs comes down with new process technologies. "Why use floating point in the embedded space? The answer is you can," said Chris Hinds, consulting member of the technical staff for ARM Ltd., which described its newest VFP10 floating-point unit for its ARM9 series of processors here.

The FPU takes up 2.4 mm² of die area in 0.18-micron process technology, and uses short vector operations to increase speed and reduce code size, the company said.

At the same time, MIPS said it has included an FPU in its latest 64-bi t 5Kf synthesizable core. The move will likely be welcomed by several of MIPS' licensees, which have expressed their desire for more floating-point strength in the MIPS architecture. The IEEE-754-compliant FPU is capable of 2 floating-point operations/s per cycle in single-precision mode.

There are compelling reasons to include floating point, not only for areas like graphics, but also in automotive controls. Designers are hoping to take advantage of the higher degree of precision FPUs offer to do tasks like engine monitoring, said Motorola's Chang.

The same is becoming true for industrial control. "We're hoping to get people to take advantage of the tremendous dynamic range in floating point," said ARM's Hinds. "It's about getting the answer you want as accurately as you need without having to play the fixed-point game."