Simultaneous Exploration of Power, Physical Design and Architectural Performance Dimensions of the SoC Design Space using SEAS
1 IBM Electronic Design Automation, 2070 Route 52, MS 2A1, Hopewell Jct, NY 12533, USA
2 IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
3 Korea Advanced Institute of Science and Technology, Daejon, Korea
Abstract :
SEAS – SoC Early Analysis and Design System was introduced in [1]. The main goal behind SEAS was to provide early design feedback in terms of the various aspects like power, architectural performance, floorplan / die-size of core-based SoCs to the system architect, while maintaining the links to implementation. In this paper we discuss extensions to SEAS in the areas of physically-aware power optimization through voltage island physical planning, transaction level functional simulation platform for embedded software development, and a transaction level power analysis methodology for early power estimation. We also present how simultaneous exploration of power, physical design and performance aspects of a SoC can be performed within SEAS.
SEAS Overview
SEAS allows users to easily specify a design in a block-diagram-like description and run types of analyses which would normally be impossible to do early in the design process with acceptable accuracy. These analyses include performance, floorplanning, timing and power. SEAS can handle core-based SoC designs, where the cores are available in a library, together with characterization data and models (e.g., for performance analysis simulation). The types of models needed will be described in the following sections.
The main advantage of SEAS from a user point of view is the ability to describe measure and change the specification at a very high-level of abstraction and quickly evaluate the effects in performance, area, timing and power. If the results are not satisfactory, the designer can quickly change the architecture, the floorplan, or the cores being used and run the analyses again. Figure 1 illustrates the overall organization of SEAS. The individual analyses algorithms are not necessarily novel, however they had to be adapted and tuned to the design representation being used (block-diagram). This tuning is critical to the accuracy of the results.
As illustrated in Figure 1, SEAS comprises of an input description similar to a block diagram and multiple analyses engines. Each engine has its own set of algorithms and internal model derived from the initial block diagram, and uses characterization data and models for the cores available from a core library. In addition to the analyses engines, the netlist generation portion of SEAS, translates the block diagram description and the system configuration information into an RTL description consisting of the set of cores and the necessary glue logic implementing the SoC. This RTL description along with the set of constraints from the analyses engines can be taken through an RTL-GDSII flow to complete the hardware implementation of the SoC.
Figure 1 : SoC Early Analysis System
Use scenario for power-performance-physical design tradeoff analysis in SEAS
In this section we present a use scenario related to how power related architectural optimizations can be performed within SEAS. In the rest of the paper the individual components of SEAS that enable these optimizations are discussed. The following would be a set of steps that can be performed by an SoC designer using SEAS,
- Build block diagram (virtual design) for the basic SoC.
- Map to functional Transaction Level Model (TLM) / performance based TLM view to generate a systemC description of the SoC
- Define a workload to model the software application or use the actual software application, execute systemC simulation and gather architecture performance information (latency, throughput, resource utilizations, etc) for the SoC
- Map to a power view of the SoC, and gather power analysis results. This step would be done by enabling the power analysis mode when executing the systemC performance/functional TLM simulation
- Use the results from power and architectural performance analysis, and explore different power related optimizations in the SEAS environment. These can be application optimizations as well as modifications to the chip physical architecture by executing different portions of the SoC at different voltages, thus exploring power-performance tradefoffs for the SoC being designed.
- Explore and validate architecture power optimizations in a physical design context, using the set of possible compatibility relationships determined in the steps of architectural performance and power simulation, through the use of voltage island physical planning portion in SEAS.
- Feedback latency information of the voltage island partitioning solutions to the architectural performance models, so that the designer can have an idea of the performance tradeoff that is being made to achieve a particular low power design solution for the SoC.
- If performance is satisfied, use the netlist generation portions of SEAS to create the top-level detailed RTL for the SoC, and use floorplan and voltage island solution generated in the voltage island physical planning step as initial constraints going into a RTL-GDSII design flow.
Transaction level simulation platform for PPC/CoreConnect architecture analysis
Transaction level models and simulation platforms composed of such models for IP cores are increasingly being used for the purpose of SoC architecture analysis and early embedded software development. These are gaining more relevance with emerging standard architecture modeling languages like systemC. Using the IBM CoreConnect SystemC Modeling Environment that forms a part of SEAS, designers can put together SystemC models for complete systems including PowerPC processors, CoreConnect bus structures, and peripherals. These models may be simulated using the standard OSCI (Open SystemC Initiative) SystemC runtime libraries [4]. Our models and environment provide designers with a system simulation/verification capability with the following characteristics:
- Simulate real application software interacting with models for IP cores and the environment for full system functional and timing verification, possibly under real-time constraints
- Verify core interconnections and communications through buses and other channels
- Inter-core communication must be cycle-approximate, which implies cycle-approximate protocol modeling
- Verify that system supports enough bandwidth and concurrency for target applications
- Simulation performance is enough to run a significant software application with an operating system booted on the system
- Transactions are modeled as occurring over communication channels
- Computation (inside a core) may not be modeled on a cycle-by-cycle basis, as long as the input-output delays are cycle-approximate
- The processor model does not have to be a true architectural model; a software-based Instruction Set Simulator (ISS) with adequate performance and timing accuracy is used,
- For the scenario where application software does not exist, the environment provides facilities to model the application behavior on a processor through a generic processor model within which application behavior can be specified through a scripting interface,
- In order to simulate real software, including the initialization and internal register programming, the models must be "bit-true" and register accurate, from an API point of view. The models must provide APIs to allow programming of registers as if the user were programming the real hardware device,
- Models need not be a precise architectural representation of the hardware. They may be behavioral models as long as they are cycle-approximate representations of the hardware for the transactions of interest (i.e., the actual transactions being modeled).
- All models must be "macro-synchronized" with one or more clocks. This means that for the atomic transactions being modeled, the transaction boundaries (begin and end) are synchronized with the appropriate clock.
Transaction level power modeling
Power is becoming a major issue in SoC Design, and the need for tackling it early on in the design cycle is imperative for chip designs. Central to transaction level power analysis is a power modeling methodology for IP cores constituting the system. To go along with the Transaction Level Models for the IP cores, we are developing a transaction level power analysis methodology in SEAS to enable early power estimation, which is briefly described in this section. The overall methodology is as follows:
- Identify tasks or instructions from the core description,
- Characterize power consumption of each task or instruction from low-level implementation,
- Generate vectors corresponding to these instructions or tasks executing on the particular IP core
- Place, route, extract parasitics for the cores
- Use power simulation tools with the parasitics, to generate power characterization information for these instructions or tasks,
- Create macromodels based on various IP core parameters:
- Parameters can be bit-width, switching activity of data, buffer size
- Augment the TLM to extract the parameters for macromodel
- This can be done dynamically at run-time to derive information during simulation (tradeoff between simulation accuracy and speed should be taken into account)
Typical TLMs capture the functional tasks associated with the behavior of an IP Core, but would not necessarily contain a lot of the non-functional tasks related to the core. These non-functional tasks would be quite important from a power consumption point of view. Since TLMs are not typically developed with the view of capturing all the power related tasks, this leads to a unique problem of needing to having a mapping mechanism from the set of tasks (functional and non-functional) identified during the transaction level power characterization method to the set of tasks present in the current transaction level model. This is one of the unique features of the transaction level power modeling employed in SEAS. An example is shown here for a memory controller core:
- functional tasks – read, write, initialize
- non-functional tasks – single bank refresh, multi-bank refresh, power control
- parameters for functional task
- Address sequence, data switching activity, burst related parameters,
Linking Power and Physical design: Physical Planning of Voltage Islands for power optimization
Voltage Island [3] is a technique, which is efficient in reducing both the switching and standby components of power consumption in a design. A voltage island is a group of on-chip circuit elements powered by the same voltage source, independent from chip-level voltage, which permit execution of different portions of design at different voltages to optimize power. In an SoC context, this enables core-level power optimization by utilizing a power supply that is unique from the rest of the design. This is an additional dimension that could be explored early on in the SoC design process. When contemplating architectural power optimizations in the SEAS environment, the designer can evaluate the physical realization of such power related decisions by using the voltage island planning portion of SEAS.
An SoC designer trying to build a low power SoC utilizing Voltage Island features will be faced with decisions like,
- What is a good partition of the design into multiple regions and what kinds of voltages are assigned to these regions,
- How to generate an early physical design implementation/floorplan for such a voltage island solution in order to estimate the effort involved in realizing the physical design of such solutions
Creating voltage islands in a chip design in order to optimize the overall power consumption, involves voltage island partition generation, voltage level assignment and floorplanning. The main technique in SEAS for voltage island planning consists of physically aware voltage island partitioning and a method for solving the problem of performing simultaneous voltage island partitioning, level assignment. The technique groups different cores into voltage island partitions while determining a floorplan for the chip and the individual islands. The overall approach for physical planning present in SEAS consists of: a) characterizing cores in terms of voltages and power consumption values; b) providing a set of IP cores that belong to a single voltage island RLM (Random Logic Macro); and c) assigning voltages for the voltage island RLMs, all within the context of generating a physically realizable floorplan for the design. This algorithm [2] is based on a sequence-pair- simulated annealing technique that employs a compatibility graph structure for maintaining the voltage, physical design compatibility relationships between the cores of the SoC. The resulting voltage island partitioning and floorplan solution can be used to augment the latency information back into the architectural TLMs, and also can be used as an initial solution for the chip implementation process.
Figure 2: Performance and Power analysis results for 1-EMAC and 2-EMAC designs (Figure originally published by the authors in [1], reused with permission from © ACM)
Design Example
In this section we discuss a PowerPC 405/coreconnect based packet processor design, to illustrate some of the components of SEAS. The design contains of an Ethernet sub-system represented by the Ethernet controller (EMAC), a Media Access Layer (MAL) core, receive and transmit FIFOs. It also contains a high-speed memory controller (HSMC), an external bus controller (EBC), DMA controller, Interrupt controller and various peripherals including 2 UARTs, 1 IIC and 1 timer. The cores are all connected to either the high-speed Processor Local Bus (PLB), or the On-Chip Peripheral Bus (OPB).
The design was created at the virual design of abstraction and the experiment consists of evaluating this design for Ethernet packet processing purposes. Performance analysis will be used for measuring the system throughput and CPU utilization, after which the architecture will be changed by adding a second Ethernet controller and the performance analysis repeated. The floorplan for both designs will be generated and die sizes estimated, along with wire length and power information.
The cores involved in packet processing are the EMAC, MAL, PLB Arbiter, CPU, and HSMC. The packets arrive from the network to the EMAC input and are received into it’s receive buffer. The MAL works as a dedicated DMA and transfers the packet through the PLB bus, to the memory controller and finally into an external memory. The time it takes for receiving a packet into memory depends on the data rate, the size of the packet, the capacity of the MAL (size of burst transfer, number of bursts needed per packet) and some constant delays associated with the EMAC and HSMC. After the packet is received in memory, the CPU then processes it by reading the header, computing a new address and writing back a new header. In this example, it is assumed that this CPU header processing is constant and does not depend on the size of the packet. This CPU time is measured off-line by profiling techniques. The packet is then read by the MAL and transmitted, through the EMAC, back to the network.
System throughput and CPU utilization are shown in Figure 4 for different packet sizes. Throughput depends on number of packets, size, processing capacity, and is limited by the maximum channel capacity (maximum bits that can be transmitted by EMAC in a second). The ratio of busy to idle times of CPU is referred to as the CPU utilization. With small packets and 1 EMAC, the CPU is 100% busy and throughput increases with packet size, up to the maximum allowed by the channel. In this example the EMAC is limited to 100 Mbits/sec. Above a certain packet size, throughput is limited to 100Mbits/sec, which causes the CPU to become idle as packet receive times become greater than CPU processing time. To increase throughput beyond 100Mbits/sec, the main option is to add extra EMACs to the design. Adding one extra EMAC doubles the maximum throughput to 200Mbits/sec permitting higher rates, and larger packets.
To account for the possibility of other potential tasks that could be performed by the system, we could target a utilization percentage of around 80% instead of targeting a 100% utilization. This would give some leverage for the CPU to respond to other system requests. This example demonstrates the ability in SEAS to change the architecture to meet the requirements and quickly validate architecture performance using performance analysis.
Power analysis is run with the performance simulation. Figure 4 shows the power consumed by the system during packet processing for the two virtual designs. It can be seen that power does not increase significantly in the 2-EMAC case, which is expected since most of the power is dissipated by the CPU when active. It also shows that when the CPU becomes partly idle the power decreases accordingly. This simulation assumes that the EMAC, MAL and CPU will be active when in use, and idle otherwise, and all other cores are in sleep mode.
Given these two architectural design points, they now need to be evaluated for size and timing. We generated the floorplan for both virtual designs and estimated their required die sizes. Based on floorplan area alone, the 1 EMAC version fit into a 5.57x5.57mm image, and 2 EMAC version needed a 6.05x6.05mm image. Because of pin-limitations on the 5.57mm image, the 6.05 mm image was used for both the 1-EMAC and 2EMAC versions. Both of these floorplans provide a starting point for power optimization using the voltage island physical planner. SEAS benefit in this case of performing a floorplanning and physical design analysis was to show that the higher performance design (2 EMACs) could fit in the same die size, with the same silicon cost.
If aggressive power management is needed for the SoC, then portions of the design can be executed at different voltages. An early view of the impact and power savings attainable by the use of voltage islands is contemplated using the voltage island planning engine of SEAS. In this experiment we used the 1-EMAC and 2-EMAC versions of the SoC design example and the initial floorplan for the virtual design as the starting point for the voltage island physical planner. The aim here is to get an idea of the overhead incurred (area, performance, ) and the achievable power savings by the use of voltage island based power optimization strategies.
An initial floorplan for 2EMAC version is shown in Figure 3. This was generated considering pre-placement, chip IO constraints and with wirelength, overlaps as the primary objectives for the purposes of die-size estimation. The boxes where the cell names are given (CPU, EMC1, etc) indicate pre-placed cores which are not to be moved during planning; in this experiment, all cores are assumed to operate at a single 1.3V supply in the initial design. For Voltage Island planning, we assign legal voltages within a range 1.0V to 1.3V for each core.
In Figure 4(a) of the example, a solution was generated with the constraint on total number of voltage islands set to 3. Both EMC1 and HSMC can be between [1.1-1.3], and CPU at 1.3V.
Figure 3 : Initial floorplan for 2EMAC Design with single Vdd=1.3V
For this case, three voltage islands are created by the planner: two are shown with enclosing rectangles both with 1.1V supply, and the third one consisting of a single core (EMC1) is powered by 1.0V. Note that HSMC is still at 1.3V although its minimum legal voltage is 1.0V. It could be operated at 1.1V instead of 1.3V if it is included in the voltage island on the left-hand side of the image, but that would lead to a significant dead space in the voltage island since EMC0 has a fixed location. EMC0 is powered with 1.1V, which is the supply of the enclosing voltage island, although it’s minimum legal voltage is 1.0V, while RX0 is at its minimum supply. For this solution instance, the power savings achieved by the voltage island planning is 16.9% while the area overhead is only 8.3%.
Figure 4 : Voltage Island Planning for 2EMAC
Figure 4(b), shows the result of Voltage Island planning the same design with a constraint of 4 voltage islands. The generated islands are shown as shaded regions with the corresponding voltage levels. This solution has an area overhead of 7.7% and a power savings of 17.4% respectively when compared to the initial solution. The latency increase due to islands can be factored back into the architecture performance analysis step in order to get a feedback on the performance impact also. Using these kinds of analysis (performance, power and area) and exploration engines in SEAS, an SoC Architect can tune the system architecture. The results of analyses can be carried forward into the rest of the design process by using the netlist generation component of SEAS, which would generate a top level netlist from the virtual design that can be taken through RTL—GDSII design flows.
Conclusion
This paper presented use scenarios of power-performance-physical design tradeoff analysis within a SoC early analysis system: SEAS, and discussed its constituents that enable such scenarios. The presence of different analysis capabilities within an integrated environment helps designers make these early architectural decisions while considering the physical realization of the actual SoC. The advantages of the approach include: (1) a simple block-diagram-like notation for design specification which allows the designer to enter and modify the design quickly, (2) integrated analyses algorithms for performance, floorplan, timing and power, which allow the designer to change the architecture, the core selection or the floorplan of the design and quickly evaluate the effect on other domains. The concepts presented have been tried on real designs and results have shown that estimations based on our approach can be accurate enough to guide early design decisions as well as used by lower-level tools. The ability to explore different aspects of an SoC architecture in the context of realizing its physical implementation in an integrated environment provides a powerful system-on-a chip analysis and design capability.
References
- “SEAS: A System for Early Analysis of SoCs”, R. A. Bergamaschi, Y. Shin, N. Dhanwada, S. Bhattacharya, W E. Dougherty, I. Nair, J. Darringer, S. Paliwal, Proceedings of CODES/ISSS 2003.
- “Architecting Voltage Islands in Core-based System-on-Chip Designs”, J. Hu, Y. Shin, N. Dhanwada, R. Marculescu, Proceedings of International Symposium on Low Power Electronics and Design 2004.
- “Managing power and performance for System-on-Chip designs using voltage islands,” “D. E. Lackey, P. S. Zuchowski, T. R. Bednar, D.W. Stout, S.W. Gould, and J .M. Cohn in Proc. Int’l Conf. on Computer Aided Design, Nov. 2002, pp. 195–202,
- “http://www.systemc.org”
Related Articles
- Simultaneous Exploration of Power, Physical Design and Architectural Performance Dimensions of the SoC Design Space using SEAS
- NoC Interconnect Fabric IP Improves SoC Power, Performance and Area
- Reducing Power Consumption while increasing SoC Performance
- Optimizing performance, power, and area in SoC designs using MIPS multi-threaded processors
- Streamlining SoC Integration With the Power of Automation
New Articles
Most Popular
E-mail This Article | Printer-Friendly Page |