Processor forum examines embedded cache, architectures

Processor forum examines embedded cache, architectures
By Anthony Cataldo, EE Times
May 1, 2002 (1:59 p.m. EST)
URL: http://www.eetimes.com/story/OEG20020429S0023

SAN MATEO, Calif. — Bolting cache memory directly onto a processor core is a time-tested way to boost performance, but some microprocessor vendors will drop hints at this week's Embedded Processor Forum suggesting that the approach may be too unwieldy for system-on-chip design.

Rather than closely wed cache memory to their processors, companies like MIPS Technologies Inc. and ARM Ltd. are tweaking their architectures to hook up to other forms of memory more distant from the processor. The idea is to save cost in terms of processor die size and to give ASIC designers more flexibility. Critics, however, contend the technique will hamper performance.

One and sometimes two levels of cache memory can be merged with a processor core to make more data available locally and to reduce the time the core spends accessing slower external DRAM. The faster the processor runs, the higher the memory latency penalty when it has to fetch data off -chip.

However, some say that this tightly coupled cache is starting to get in the way. Designers of complex systems-on-chip (SoCs), for example, already have their own favored memory compilers and see the processor cache as unnecessary. Some of the most highly integrated designs may use half their die just for memory, and some designers may want to link their processor to these customized memory arrays instead.

"If I'm stuck with your cache implementation and it doesn't ideally suit me, then it causes an incredible amount of pain to redesign," said Cary Snyder, a principal analyst with MicroDesign Resources, which runs the forum, to be held in San Jose, Calif.

Processor vendors are starting to take this as a cue from ASIC designers. MIPS Technologies will announce at the forum a multiprocessing core that does without a cache and includes interfaces for external SRAMs. ARM declined to discuss details of forthcoming architect ures, but Snyder said the British company has been reworking its ARM10 design to be less cache-dependent and has designed the upcoming ARM11 for this purpose from the start.

"It can be risky unless you add the hooks that allow it to be a less risky situation, and it looks like ARM and MIPS have done that," Snyder said.

Not everybody is ready to move to cacheless processor cores, though. NEC Corp., for one, has just added 256 kbytes of Level 2 cache as well as an SDRAM controller to its latest 64-bit Vr7701 processor core in order to lessen memory access delays.

Distancing the cache from the processor may make sense for 32-bit cores used in cost-sensitive SoC devices, but it will mean a performance hit, said Arnold Estep, senior marketing manager for Vr products at NEC.

"When the cache is tightly coupled to the core, you can guarantee that you'll get the data from the cache in one clock," he said. "If it's an SRAM in an SoC, you don't have a lot of control over the access time."

IBM Corp. is taking yet another approach with its latest 440GX PowerPC processor, designed with the company's 0.13-micron ASIC methodology. It includes an L1 instruction and data cache and 256 kbytes of software-controlled SRAM that transfers data over the processor's local bus at 5.3 Gbytes per second. In this way, the L2 cache can ostensibly be shared more readily by other elements.

Memory maker Micron Technology Inc., meanwhile, will disclose details at the forum of its SC-1 processor, based on a synthesizable MIPS 32-bit 4Kc core surrounded by 8 Mbytes of embedded DRAM fabricated on Micron's 0.18-micron process technology. Rather than having the processor go out over I/O buffers and pc-board traces to fetch data from DRAM, the SC-1 brings the DRAM on the same die to reduce read and write latencies. The chip includes a link to external flash memory and to external DRA M as an option, Micron said.

Reconfigurable twists

While memories will be one point of debate at the Embedded Processor Forum, the use of reconfigurable processor cores will also be a topic of discussion as more players enter this space and others extend the capabilities of their offerings.

Toshiba Corp. is set to pull its weight in the configurable-core market this week, when it describes a media processor architecture billed as configurable and extensible. Another large chip vendor taking an interest in configurable processors is Europe's STMicroelectronics, which will describe a face-recognition processor based on the Xtensa core licensed from Tensilica Inc. ST created a customized version of Xtensa because it needed a way to process an enormous number of clock cycles in real-time to match a facial image with those that reside in a database, something the company said can't be done with a digital signal processor.

Both Tensilica and rival ARC Cores will also describe how they are tailoring their cores for specific applications. ARC is set to discuss telephony DSP extensions for voice-over-Internet Protocol (VoIP) and voice-over-DSL applications.

Tensilica, meanwhile, will disclose details of new instruction extensions for MPEG-4 video decoding.

Though configurable processor cores may be all the rage, processor vendors continue to churn out new very long instruction word designs for the embedded space. The latest VLIW entrant is NEC, which will describe this week a low-power design for mobile multimedia terminals that can be tuned for video processing.

Meanwhile, Netergy Microelectronics, a subsidiary of 8x8 Inc., said it will use the ST200 VLIW processor it licensed from STMicroelectronics for its VoIP processors. The ST200 will be used to process video while Netergy's Audacity-T2 core and software handle the audio and network processing.

More Embedded Processor Forum coverage.