Multiprocessor design for SoCs
EE Times: Multiprocessor design for SoCs | |
Ashish Dixit (09/19/2005 10:00 AM EDT) URL: http://www.eetimes.com/showArticle.jhtml?articleID=170703815 | |
For many applications, allocating performance among all of the tasks in a system-on-chip (SoC) design is much easier, and provides greater design flexibility, with multiple CPUs than with just one control processor and multiple blocks of logic. Multiple-processor design changes the role of processors, making it possible to design programmability into many functions while keeping power budgets under control. The biggest advantage of using multiple processors as SoC task blocks is that they're programmable, so changes can be made in software after the chip design is finished. This means that complex state machines can be implemented in firmware running on the processor, significantly reducing verification time. And one SoC can often be used for multiple products, turning features on and off as necessary. Multiple-processor design promotes much more efficient use of memory blocks. A multiple-processor-based approach makes most of the memories processor-visible, processor-controlled, processor-managed, processor-tested and processor-initialized. Additionally, this reduces overall memory requirements while promoting the flexible sharing and reuse of on-chip memories. But how do you pick the right embedded processors for multiple-CPU designs? How do you partition your design to take maximum advantage of multiple processors? How do you manage the software among all the processors? How do you connect them and manage communications in the hardware? Four techniques At the conceptual level, the entire system can be treated as a constellation of concurrent, interacting subsystems or tasks. Each task communicates with other subsystems and shares common resources (memory, data structures, network points). Developers start from a set of tasks for the system and exploit the parallelism by applying a spectrum of techniques, including four basic actions:
These methods interact with one another, so iterative refinement is often essential, particularly as the design evolves. When a system's functions are partitioned into multiple interacting function blocks, there are several possible organizational forms or structures, including:
Assigning tasks to processors
The process of determining the right number of processors cannot be separated from the process of determining the right processor type and configuration. Traditionally, a real-time computation task is characterized with a "Mips requirement"-how many millions of execution cycles per second are required. A control task needs substantially more cycles if it's running on a simple DSP rather than a RISC processor. A numerical task usually needs more cycles running on a RISC CPU than a DSP. However, most designs contain no more than two types of processors, because mixing RISC processors and DSPs requires working with multiple software development tools. Configurable processors can be modified to provide 10 to 50 times higher performance than general-purpose RISC processors. This often allows configurable processors to be used for tasks that previously were implemented in hardware using Verilog or VHDL. Staying with a single configurable processor family allows the same software development tools to be shared for all the processors. Once the rough number and types of processors are known and tasks are tentatively assigned to the processors, basic communications structure design starts. The goal is to discover the least expensive communications structure that satisfies the bandwidth and latency requirements of the tasks. When low cost and flexibility are most important, a shared-bus architecture, in which all resources are connected to one bus, may be most appropriate. The glaring liability of the shared bus is long and unpredictable latency, particularly when a number of bus masters contend for access to different shared resources. A parallel communications network provides high throughput with flexibility. The most common example is a crossbar connection with a two-level hierarchy of buses. Also, direct connections can be made when the communications among the processors are well-understood and will not change. Intertask communications are built on two foundations: the software communications mode and the corresponding hardware mechanism. The three basic styles of software communications among tasks are message passing, shared memory and device drivers. Message passing makes all communications among tasks overt. All data is private to a task except when operands are sent by one task and received by another. Message passing is generally easier to code than shared memory when the tasks are largely independent but often harder to code efficiently with tightly coupled tasks. With shared-memory communications, only one task reads from or writes to the data buffer in memory at a time, requiring explicit access synchronization. Embedded-software languages, such as C, typically include features that ease shared-memory programming. The hardware-device-plus-software-device-driver model is most commonly used with complex I/O interfaces, such as networks or storage devices. The device driver mode combines elements of message passing and shared-memory access. Processors must interface with memories, I/O interfaces and RTL blocks. These guidelines may help designers take better advantage of RAMs:
Watch for contention latency in memory access. Increase memory width or increase the number of memories that can be active to overcome contention bottlenecks. Pay particular attention to tasks that must move data from off-chip memory through the processor, and back to off-chip memory; these tasks can quickly consume all available bandwidth. The move toward multiple-processor SoC designs is very real. Multiple processors are used in consumer devices ranging from low-cost inkjet printers to cell phones. As designers get comfortable with a processor-based approach, processors have the potential to become the next major building block for SoC designs, and SoC designers will turn to a processor-centric design methodology that has the potential to solve the ever-increasing hardware/software integration dilemma. Ashish Dixit (adixit@tensilica.com), vice president of hardware engineering at Tensilica Inc. (Santa Clara, Calif.)
| |
All material on this site Copyright © 2005 CMP Media LLC. All rights reserved. - - | |
Related Articles
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
E-mail This Article | Printer-Friendly Page |