Providing memory system and compiler support for MPSoc designs: Compiler Support (Part 3)
Embedded.com (01/07/09, 01:41:00 AM EST)
An optimizing compiler that targets MPSoC environments should tackle a number of critical issues. Building on what was learned in Part 1 and Part 2, we first explain these issues and then study potential solutions. From the performance viewpoint, perhaps the two most important memory-related tasks to be performed in an MPSoC environment are optimizing parallelism and locality. Other important issues relate to power/energy consumption and memory space.
The problem with parallelism
Optimizing parallelism is obviously important, since parallelism is the main reason to employ multiple processors in a single unit. In fact, a parallelization strategy determines how memory is utilized by multiple on-chip processors and can be an important factor for achieving an acceptable performance. However, maximum parallelism may not always be easy to achieve because of several factors. For example, intrinsic data dependences in the code may not allow full utilization of all on-chip processors. Similarly, in some cases, interprocessor communication costs can be overwhelming as one increases the number of processors used.
Finally, performance benefits due to increased interprocessor parallelism may not be sufficient when one considers the increase in power consumption. Because of all these, it may be preferable to avoid increasing the number of processors arbitrarily. In addition, the possibility of different parts of the same application demanding different number of processors can make the problem much harder.
Instruction and Data Locality
An equally important problem is ensuring locality of data/instruction accesses. Although achieving acceptable instruction cache performance is not very difficult (since instructions are read-only and exhibit perfect spatial locality), the same cannot be said for data locality.
This is because straightforward coding of many applications can lead to poor data cache utilization. In addition, in an MPSoC environment, interprocessor communication can lead to frequent cache line invalidations/updates (due to interprocessor data sharing), which in turn increases overall latency.
This last issue becomes particularly problematic when false sharing occurs (i.e., the multiple processors share a cache line but not the same data in it). Therefore, an important task for the compiler is to minimize false sharing as much as possible.
E-mail This Article | Printer-Friendly Page |
Related Articles
- Providing memory system and compiler support for MPSoc designs: Memory Architectures (Part 1)
- Providing memory system and compiler support for MPSoc designs: Customization of memory architectures (Part 2)
- Optimizing embedded software for power efficiency: Part 3 - Optimizing data flow and memory
- Using non-volatile memory IP in system on chip designs
- Dealing with clock jitter in embedded DDR2/DDR3 DRAM designs: Part 3
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- System Verilog Assertions Simplified
- Smart Tracking of SoC Verification Progress Using Synopsys' Hierarchical Verification Plan (HVP)
- Dynamic Memory Allocation and Fragmentation in C and C++
- Synthesis Methodology & Netlist Qualification