|
|||
Simultaneous Exploration of Power, Physical Design and Architectural Performance Dimensions of the SoC Design Space using SEAS
Nagu Dhanwada1, Reinaldo Bergamaschi2, William Dungan1, Indira Nair2, William Dougherty1, Youngsoo Shin3, Subhrajit Bhattacharya2, Ing-Chao Lin, John Darringer2, Sarala Paliwa1l
1 IBM Electronic Design Automation, 2070 Route 52, MS 2A1, Hopewell Jct, NY 12533, USA 2 IBM T.J. Watson Research Center, Yorktown Heights, NY, USA 3 Korea Advanced Institute of Science and Technology, Daejon, Korea Contact: Nagu Dhanwada, Tel: 1-845-892-4359 Abstract : SEAS – SoC Early Analysis and Design System was introduced in [1]. The main goal behind SEAS was to provide early design feedback in terms of the various aspects like power, architectural performance, floorplan / die-size of core-based SoCs to the system architect, while maintaining the links to implementation. In this paper we discuss extensions to SEAS in the areas of physically-aware power optimization through voltage island physical planning, transaction level functional simulation platform for embedded software development, and a transaction level power analysis methodology for early power estimation. We also present how simultaneous exploration of power, physical design and performance aspects of a SoC can be performed within SEAS. SEAS Overview SEAS allows users to easily specify a design in a block-diagram-like description and run types of analyses which would normally be impossible to do early in the design process with acceptable accuracy. These analyses include performance, floorplanning, timing and power. SEAS can handle core-based SoC designs, where the cores are available in a library, together with characterization data and models (e.g., for performance analysis simulation). The types of models needed will be described in the following sections. The main advantage of SEAS from a user point of view is the ability to describe measure and change the specification at a very high-level of abstraction and quickly evaluate the effects in performance, area, timing and power. If the results are not satisfactory, the designer can quickly change the architecture, the floorplan, or the cores being used and run the analyses again. Figure 1 illustrates the overall organization of SEAS. The individual analyses algorithms are not necessarily novel, however they had to be adapted and tuned to the design representation being used (block-diagram). This tuning is critical to the accuracy of the results. As illustrated in Figure 1, SEAS comprises of an input description similar to a block diagram and multiple analyses engines. Each engine has its own set of algorithms and internal model derived from the initial block diagram, and uses characterization data and models for the cores available from a core library. In addition to the analyses engines, the netlist generation portion of SEAS, translates the block diagram description and the system configuration information into an RTL description consisting of the set of cores and the necessary glue logic implementing the SoC. This RTL description along with the set of constraints from the analyses engines can be taken through an RTL-GDSII flow to complete the hardware implementation of the SoC. Figure 1 : SoC Early Analysis System Use scenario for power-performance-physical design tradeoff analysis in SEAS In this section we present a use scenario related to how power related architectural optimizations can be performed within SEAS. In the rest of the paper the individual components of SEAS that enable these optimizations are discussed. The following would be a set of steps that can be performed by an SoC designer using SEAS,
Transaction level simulation platform for PPC/CoreConnect architecture analysis Transaction level models and simulation platforms composed of such models for IP cores are increasingly being used for the purpose of SoC architecture analysis and early embedded software development. These are gaining more relevance with emerging standard architecture modeling languages like systemC. Using the IBM CoreConnect SystemC Modeling Environment that forms a part of SEAS, designers can put together SystemC models for complete systems including PowerPC processors, CoreConnect bus structures, and peripherals. These models may be simulated using the standard OSCI (Open SystemC Initiative) SystemC runtime libraries [4]. Our models and environment provide designers with a system simulation/verification capability with the following characteristics:
Transaction level power modeling Power is becoming a major issue in SoC Design, and the need for tackling it early on in the design cycle is imperative for chip designs. Central to transaction level power analysis is a power modeling methodology for IP cores constituting the system. To go along with the Transaction Level Models for the IP cores, we are developing a transaction level power analysis methodology in SEAS to enable early power estimation, which is briefly described in this section. The overall methodology is as follows:
Typical TLMs capture the functional tasks associated with the behavior of an IP Core, but would not necessarily contain a lot of the non-functional tasks related to the core. These non-functional tasks would be quite important from a power consumption point of view. Since TLMs are not typically developed with the view of capturing all the power related tasks, this leads to a unique problem of needing to having a mapping mechanism from the set of tasks (functional and non-functional) identified during the transaction level power characterization method to the set of tasks present in the current transaction level model. This is one of the unique features of the transaction level power modeling employed in SEAS. An example is shown here for a memory controller core:
The mapping can be done statically or dynamically, and forms the transaction level power estimation component of SEAS which is currently being developed. Using the transaction level power and performance simulation the user can get early architecture performance and power estimates for a given application that is executing. The next section discusses how the physical realization of power related architecture modifications can be made within the same SEAS environment through the use of voltage island physical planning capabilities. Linking Power and Physical design: Physical Planning of Voltage Islands for power optimization Voltage Island [3] is a technique, which is efficient in reducing both the switching and standby components of power consumption in a design. A voltage island is a group of on-chip circuit elements powered by the same voltage source, independent from chip-level voltage, which permit execution of different portions of design at different voltages to optimize power. In an SoC context, this enables core-level power optimization by utilizing a power supply that is unique from the rest of the design. This is an additional dimension that could be explored early on in the SoC design process. When contemplating architectural power optimizations in the SEAS environment, the designer can evaluate the physical realization of such power related decisions by using the voltage island planning portion of SEAS. An SoC designer trying to build a low power SoC utilizing Voltage Island features will be faced with decisions like,
Creating voltage islands in a chip design in order to optimize the overall power consumption, involves voltage island partition generation, voltage level assignment and floorplanning. The main technique in SEAS for voltage island planning consists of physically aware voltage island partitioning and a method for solving the problem of performing simultaneous voltage island partitioning, level assignment. The technique groups different cores into voltage island partitions while determining a floorplan for the chip and the individual islands. The overall approach for physical planning present in SEAS consists of: a) characterizing cores in terms of voltages and power consumption values; b) providing a set of IP cores that belong to a single voltage island RLM (Random Logic Macro); and c) assigning voltages for the voltage island RLMs, all within the context of generating a physically realizable floorplan for the design. This algorithm [2] is based on a sequence-pair- simulated annealing technique that employs a compatibility graph structure for maintaining the voltage, physical design compatibility relationships between the cores of the SoC. The resulting voltage island partitioning and floorplan solution can be used to augment the latency information back into the architectural TLMs, and also can be used as an initial solution for the chip implementation process. Design Example In this section we discuss a PowerPC 405/coreconnect based packet processor design, to illustrate some of the components of SEAS. The design contains of an Ethernet sub-system represented by the Ethernet controller (EMAC), a Media Access Layer (MAL) core, receive and transmit FIFOs. It also contains a high-speed memory controller (HSMC), an external bus controller (EBC), DMA controller, Interrupt controller and various peripherals including 2 UARTs, 1 IIC and 1 timer. The cores are all connected to either the high-speed Processor Local Bus (PLB), or the On-Chip Peripheral Bus (OPB). The design was created at the virual design of abstraction and the experiment consists of evaluating this design for Ethernet packet processing purposes. Performance analysis will be used for measuring the system throughput and CPU utilization, after which the architecture will be changed by adding a second Ethernet controller and the performance analysis repeated. The floorplan for both designs will be generated and die sizes estimated, along with wire length and power information. The cores involved in packet processing are the EMAC, MAL, PLB Arbiter, CPU, and HSMC. The packets arrive from the network to the EMAC input and are received into it’s receive buffer. The MAL works as a dedicated DMA and transfers the packet through the PLB bus, to the memory controller and finally into an external memory. The time it takes for receiving a packet into memory depends on the data rate, the size of the packet, the capacity of the MAL (size of burst transfer, number of bursts needed per packet) and some constant delays associated with the EMAC and HSMC. After the packet is received in memory, the CPU then processes it by reading the header, computing a new address and writing back a new header. In this example, it is assumed that this CPU header processing is constant and does not depend on the size of the packet. This CPU time is measured off-line by profiling techniques. The packet is then read by the MAL and transmitted, through the EMAC, back to the network. System throughput and CPU utilization are shown in Figure 4 for different packet sizes. Throughput depends on number of packets, size, processing capacity, and is limited by the maximum channel capacity (maximum bits that can be transmitted by EMAC in a second). The ratio of busy to idle times of CPU is referred to as the CPU utilization. With small packets and 1 EMAC, the CPU is 100% busy and throughput increases with packet size, up to the maximum allowed by the channel. In this example the EMAC is limited to 100 Mbits/sec. Above a certain packet size, throughput is limited to 100Mbits/sec, which causes the CPU to become idle as packet receive times become greater than CPU processing time. To increase throughput beyond 100Mbits/sec, the main option is to add extra EMACs to the design. Adding one extra EMAC doubles the maximum throughput to 200Mbits/sec permitting higher rates, and larger packets. To account for the possibility of other potential tasks that could be performed by the system, we could target a utilization percentage of around 80% instead of targeting a 100% utilization. This would give some leverage for the CPU to respond to other system requests. This example demonstrates the ability in SEAS to change the architecture to meet the requirements and quickly validate architecture performance using performance analysis. Power analysis is run with the performance simulation. Figure 4 shows the power consumed by the system during packet processing for the two virtual designs. It can be seen that power does not increase significantly in the 2-EMAC case, which is expected since most of the power is dissipated by the CPU when active. It also shows that when the CPU becomes partly idle the power decreases accordingly. This simulation assumes that the EMAC, MAL and CPU will be active when in use, and idle otherwise, and all other cores are in sleep mode. Given these two architectural design points, they now need to be evaluated for size and timing. We generated the floorplan for both virtual designs and estimated their required die sizes. Based on floorplan area alone, the 1 EMAC version fit into a 5.57x5.57mm image, and 2 EMAC version needed a 6.05x6.05mm image. Because of pin-limitations on the 5.57mm image, the 6.05 mm image was used for both the 1-EMAC and 2EMAC versions. Both of these floorplans provide a starting point for power optimization using the voltage island physical planner. SEAS benefit in this case of performing a floorplanning and physical design analysis was to show that the higher performance design (2 EMACs) could fit in the same die size, with the same silicon cost. If aggressive power management is needed for the SoC, then portions of the design can be executed at different voltages. An early view of the impact and power savings attainable by the use of voltage islands is contemplated using the voltage island planning engine of SEAS. In this experiment we used the 1-EMAC and 2-EMAC versions of the SoC design example and the initial floorplan for the virtual design as the starting point for the voltage island physical planner. The aim here is to get an idea of the overhead incurred (area, performance, ) and the achievable power savings by the use of voltage island based power optimization strategies. An initial floorplan for 2EMAC version is shown in Figure 3. This was generated considering pre-placement, chip IO constraints and with wirelength, overlaps as the primary objectives for the purposes of die-size estimation. The boxes where the cell names are given (CPU, EMC1, etc) indicate pre-placed cores which are not to be moved during planning; in this experiment, all cores are assumed to operate at a single 1.3V supply in the initial design. For Voltage Island planning, we assign legal voltages within a range 1.0V to 1.3V for each core. Figure 3 : Initial floorplan for 2EMAC Design with single Vdd=1.3V In Figure 4(a) of the example, a solution was generated with the constraint on total number of voltage islands set to 3. Both EMC1 and HSMC can be between [1.1-1.3], and CPU at 1.3V. For this case, three voltage islands are created by the planner: two are shown with enclosing rectangles both with 1.1V supply, and the third one consisting of a single core (EMC1) is powered by 1.0V. Note that HSMC is still at 1.3V although its minimum legal voltage is 1.0V. It could be operated at 1.1V instead of 1.3V if it is included in the voltage island on the left-hand side of the image, but that would lead to a significant dead space in the voltage island since EMC0 has a fixed location. EMC0 is powered with 1.1V, which is the supply of the enclosing voltage island, although it’s minimum legal voltage is 1.0V, while RX0 is at its minimum supply. For this solution instance, the power savings achieved by the voltage island planning is 16.9% while the area overhead is only 8.3%. Figure 4 : Voltage Island Planning for 2EMAC Figure 4(b), shows the result of Voltage Island planning the same design with a constraint of 4 voltage islands. The generated islands are shown as shaded regions with the corresponding voltage levels. This solution has an area overhead of 7.7% and a power savings of 17.4% respectively when compared to the initial solution. The latency increase due to islands can be factored back into the architecture performance analysis step in order to get a feedback on the performance impact also. Using these kinds of analysis (performance, power and area) and exploration engines in SEAS, an SoC Architect can tune the system architecture. The results of analyses can be carried forward into the rest of the design process by using the netlist generation component of SEAS, which would generate a top level netlist from the virtual design that can be taken through RTL—GDSII design flows. Conclusion This paper presented use scenarios of power-performance-physical design tradeoff analysis within a SoC early analysis system: SEAS, and discussed its constituents that enable such scenarios. The presence of different analysis capabilities within an integrated environment helps designers make these early architectural decisions while considering the physical realization of the actual SoC. The advantages of the approach include: (1) a simple block-diagram-like notation for design specification which allows the designer to enter and modify the design quickly, (2) integrated analyses algorithms for performance, floorplan, timing and power, which allow the designer to change the architecture, the core selection or the floorplan of the design and quickly evaluate the effect on other domains. The concepts presented have been tried on real designs and results have shown that estimations based on our approach can be accurate enough to guide early design decisions as well as used by lower-level tools. The ability to explore different aspects of an SoC architecture in the context of realizing its physical implementation in an integrated environment provides a powerful system-on-a chip analysis and design capability. References 1. “SEAS: A System for Early Analysis of SoCs”, R. A. Bergamaschi, Y. Shin, N. Dhanwada, S. Bhattacharya, W E. Dougherty, I. Nair, J. Darringer, S. Paliwal, Proceedings of CODES/ISSS 2003. 2. “Architecting Voltage Islands in Core-based System-on-Chip Designs”, J. Hu, Y. Shin, N. Dhanwada, R. Marculescu, Proceedings of International Symposium on Low Power Electronics and Design 2004. 3. “Managing power and performance for System-on-Chip designs using voltage islands,” “D. E. Lackey, P. S. Zuchowski, T. R. Bednar, D.W. Stout, S.W. Gould, and J .M. Cohn in Proc. Int’l Conf. on Computer Aided Design, Nov. 2002, pp. 195–202, 4. “http://www.systemc.org” |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |