Interfacing High Performance 32-bit Cores To MCU-based Memory Architectures

By Bob Martin, MIPS Technologies, Inc.
Embedded.com -- (04/10/08, 06:17:00 PM EDT)

As higher-performance 32-bit processor cores begin to make large gains into the microcontroller (MCU) space currently dominated by 8- and 16-bit devices, chip architects are facing similar challenges in system design that PC designers faced about a decade ago.

While the speed and performance of the new cores has increased, some of the key supporting technologies have not kept up, resulting in severe performance bottlenecks.

Most microcontrollers rely completely on internal memory devices of two types. Moderate amounts of SRAM provide the required data storage space, and NOR FLASH provides the instruction and constant data space.

Embedded SRAM technology is keeping pace with the increase in both size and operation speed of the new 32-bit cores. Mature SRAM technology is easily available in the 10ns (100 MHz) operational range and is cost-effective at this speed grade for the typical RAM sizes required by microcontrollers.

But standard NOR FLASH is lagging behind the basic 32-bit core clock speed by almost an order of magnitude. Current embedded NOR FLASH technology is sitting at around 50ns (20 Mhz) access times. This introduces a real bottleneck in the ability to transfer data between the FLASH device and the core, since the core can waste several clock cycles waiting for the specific instruction to be retrieved by the FLASH memory.

This performance gap between processor core speed and FLASH access times is compounded by the standard microcontroller execution model—XIP (eXecute In Place).

Application fault tolerance and the cost of SRAM in larger memory sizes are two major reasons why executing directly from FLASH is preferable. Programs stored in FLASH are far less likely to be corrupted by random errors in the system, such as power rail glitches. Executing directly from FLASH also removes the need to supply the MCU device with enough SRAM to allow the application to be copied from a ROM or FLASH device into the targeted RAM execution space.

While improving FLASH technology so it matches the performance of 32-bit cores would be ideal, current technology prevents this. There are, however, some efficient methodologies the architect can employ to unclog the performance bottleneck.

Simple instruction pre-fetch buffers and i-cache systems placed into 32-bit MCU designs can have a profound effect on improving MCU performance. Following is a description of how system architects can employ these techniques when upgrading their MCU architecture from 16-bits to a 32-bit core CPU.

Click here to read more ...