Bluetooth low energy v5.4 Baseband Controller, Protocol Software Stack and Profiles IP
Providing memory system and compiler support for MPSoc designs: Compiler Support (Part 3)
Embedded.com (01/07/09, 01:41:00 AM EST)
An optimizing compiler that targets MPSoC environments should tackle a number of critical issues. Building on what was learned in Part 1 and Part 2, we first explain these issues and then study potential solutions. From the performance viewpoint, perhaps the two most important memory-related tasks to be performed in an MPSoC environment are optimizing parallelism and locality. Other important issues relate to power/energy consumption and memory space.
The problem with parallelism
Optimizing parallelism is obviously important, since parallelism is the main reason to employ multiple processors in a single unit. In fact, a parallelization strategy determines how memory is utilized by multiple on-chip processors and can be an important factor for achieving an acceptable performance. However, maximum parallelism may not always be easy to achieve because of several factors. For example, intrinsic data dependences in the code may not allow full utilization of all on-chip processors. Similarly, in some cases, interprocessor communication costs can be overwhelming as one increases the number of processors used.
Finally, performance benefits due to increased interprocessor parallelism may not be sufficient when one considers the increase in power consumption. Because of all these, it may be preferable to avoid increasing the number of processors arbitrarily. In addition, the possibility of different parts of the same application demanding different number of processors can make the problem much harder.
Instruction and Data Locality
An equally important problem is ensuring locality of data/instruction accesses. Although achieving acceptable instruction cache performance is not very difficult (since instructions are read-only and exhibit perfect spatial locality), the same cannot be said for data locality.
This is because straightforward coding of many applications can lead to poor data cache utilization. In addition, in an MPSoC environment, interprocessor communication can lead to frequent cache line invalidations/updates (due to interprocessor data sharing), which in turn increases overall latency.
This last issue becomes particularly problematic when false sharing occurs (i.e., the multiple processors share a cache line but not the same data in it). Therefore, an important task for the compiler is to minimize false sharing as much as possible.
E-mail This Article | Printer-Friendly Page |
Related Articles
- Providing memory system and compiler support for MPSoc designs: Memory Architectures (Part 1)
- Providing memory system and compiler support for MPSoc designs: Customization of memory architectures (Part 2)
- Optimizing embedded software for power efficiency: Part 3 - Optimizing data flow and memory
- Using non-volatile memory IP in system on chip designs
- Dealing with clock jitter in embedded DDR2/DDR3 DRAM designs: Part 3
New Articles
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
- Synthesis Methodology & Netlist Qualification
- Streamlining SoC Design with IDS-Integrate™