Providing memory system and compiler support for MPSoc designs: Memory Architectures (Part 1)

By Mahmut Kandemir and Nikil Dutt
Embedded.com (01/05/09, 06:50:00 PM EST)

System-on-chip (SoC) architectures are being increasingly employed to solve a diverse spectrum of problems in the embedded and mobile systems domain. The resulting increase in the complexity of applications ported into SoC architectures places a tremendous burden on the computational resources required to deliver the required functionality.

An emerging architectural solution places multiple processor cores on a single chip to manage the computational requirements. Such a multiprocessor system-on-chip (MPSoC) architecture has several advantages over a conventional strategy that employs a single, more powerful (but complex) processor on the chip.

First, the design of an on-chip multiprocessor composed of multiple simple processor cores is simpler than that of a complex single-processor system. This simplicity also helps reduce the time spent in verification and validation..

Second, an on-chip multiprocessor is expected to result in better utilization of the silicon space. The extra logic that would be spent on register renaming, instruction wake-up, speculation/predication, and register bypass on a complex single processor can be spent on providing higher bandwidth on an on-chip multiprocessor.

Third, an MPSoC architecture can exploit loop-level parallelism at the software level in array-intensive embedded applications. In contrast, a complex single-processor architecture needs to convert loop-level parallelism to instruction-level parallelism at run time (that is, dynamically) using sophisticated (and power-hungry) strategies. During this process, some loss in parallelism is inevitable.

Finally, a multiprocessor configuration provides an opportunity for energy savings through careful and selective management of individual processors. Overall, an on-chip multiprocessor is a suitable platform for executing array-intensive computations commonly found in embedded image and video processing applications.

One of the most critical components that determine the success of an MPSoC based architecture is its memory system. This is because many applications spend a significant portion of their cycles in the memory hierarchy. In fact, one can expect this to be even more so in the future, considering the ever-increasing dataset sizes, coupled with the widening processor-memory gap.

In addition, from an energy consumption angle, the memory system can contribute up to 90% of the overall system power. In fact, one can expect that a significant portion of the transistors in an MPSoC-based architecture will be devoted to the memory hierarchy.

There are at least two major (and complementary) ways of optimizing the memory performance of an MPSoC-based system: (1) constructing a suitable memory organization/hierarchy and (2) optimizing the software (application) for it. This chapter focuses on these two issues and discusses different potential solutions for them.

On the architecture side, one can employ a traditional cache-based hierarchy or can opt to build a customized memory hierarchy, which can consist of caches, scratch pad memories, stream buffers, LIFOs, or a combination of these. It is also possible to make some architectural features reconfigurable and tune their parameters at run time according to the needs of the application being executed.

Traditional compilation techniques for multiprocessor architectures focus only on performance (execution cycles). However, in an MPSoC-based environment, one might want to include other metrics of interest as well, such as energy/power consumption and memory space usage. Therefore, the compiler's job is much more difficult in our context compared with the case of traditional high-end multiprocessors.

Click here to read more ...