|
||||||||||
A Chip IP Integrator for System Level Design
Mikhail Baklashov, ARM, Inc.
Sunnyvale, CA USA Abstract This paper describes a new approach for chip design and system-level integration. A hierarchical RTL context-preserving insertion and connectivity methodology has been further implemented in EDA tool – chip IP integrator. This paper shares the approach, methodology and the results on a real-life system comprising several RTL design blocks in Verilog each having around a quarter of million instances as well as on an ARM 4 CPU test chip design. The tool maintains two dynamic views – design tree view and parse tree view which allow for hierarchical insertion, accounting for design reuse and preserving shared module connectivity as well as input design context. 1. Introduction Rapidly rising SoC complexity and shortening time-to-market for modern designs stimulates development of new and intelligent EDA tools and methodologies to be used in chip level design. To deal with such enormous complexity a system is broken into many large pieces or blocks called tiles, managed by a group of designers. The real life system may account for few dozen tiles containing memories and other IP design components. Memories today are of many different types and sizes and may take up to 80% of the tile size. Any system during its design cycle gets modified multiple times by designers who often change memory types and configurations, system IP components and interfaces between memories and components. Modern memory compilers are capable of generating configurable IP’s customized for particular chip requirements. That requires developing highly parameterized chip insertion and connectivity solutions. Memory changes also happen when semiconductor companies switch from one vendor to another memory vendor. With every modification, designers must properly integrate and connect new memories and IP components with the rest of the system. For instance, memory built-in-self-test (BIST) becomes a mandatory design practice for high volume chip production that involves new test component insertions and memory connectivity. It is estimated that insertion of four dozen components having an average of 25 ports into a 12 level hierarchical design will add another few thousand of wires to the design. Today, there is little to no automation that helps designers in this time consuming system design and integration process. Currently, designers employ manual insertion and connectivity or write scripts to facilitate the process. Such manual stitching is extremely inefficient and may take several weeks for large designs. In this paper, in Section 2, the notion of intelligent connectivity and requirements for the chip integrator tool development is introduced. Section 3 starts with an overview of the chip integrator environment. Section 4 describes formal XML views to define input specification and configuration information about design IP components and their constraints. Shared model connectivity is described in Section 5. Experimental results will be shown in Section 6, and Section 7 will conclude the paper. 2. Connectivity Issues The complexity of the stitching process comes from a hierarchical nature of today designs along with shared memory modules that require an intelligent connectivity considering their reuse nature. The intelligent connectivity implies the ability of finding a memory or other components in the design, instantiating a new IP component and connecting them accounting for design hierarchy and reuse. All modifications have to be done on an input RTL design specification preserving text input format and comments for its further use by designers. Unfortunately, popular EDA tools have not been developed to satisfy these context-preserving insertion and connectivity requirements. 3. Proposed Methodology An advanced chip integrator tool and a connectivity methodology were internally developed by ARM. Fig. 1 depicts main elements of the chip integrator and its environment. The tool allows for keeping two dynamic design views: a design tree view and a parse tree view. An intelligently designed lexical analyzer was implemented to maintain all input Verilog formatting, preserve input comments and compiler directives. The Verilog parser reads input RTL design and supports Verilog-2001 (V2K) syntax. Fig. 1 Chip Integrator Environment The parse tree is built to have a snapshot of the input design. The tree is annotated with text formatting information, comments and compiler directives. It is dynamically updated with newly inserted components and connections. The design tree is built during the input design elaboration. It helps in searching design components and maintaining hierarchical design information. It is being annotated with newly inserted components. A connectivity language or configuration helps statically specify components to be instantiated in the design and their connectivity specifications. Each component specification is defined in its own class derived from a generic device class. The specification contains port names, port directions and sizes and connectivity information on “how” and “to what” device to connect each port. An example of a formal configuration for port objects is shown in Fig. 2. Fig. 2 Port Configuration An input specification introduces another dynamic connectivity view with component types, their design names, hierarchical instance names and other relevant design and connectivity parameters. The chip integrator block builds connectivity configurations for each component to be instantiated in the design, inserts and connects components specified in the input specification file. It intelligently processes hierarchical instance names obtained from the input specification to get a list of all shared modules inside the design. Its insertion engine places components according to the input specification, and its connectivity engine intelligently traverses the design hierarchy placing wires and ports respectively. The chip integrator supports broadcast, bus, serial, device-to-device and other types of connections. 4. XML Schema Design and Component XML Schema is being used by the integrator to formalize on both, input specification and connectivity configuration requirements. XML Schema, adopted by SPIRIT Consortium for innovative IP reuse and design automation [1], is a meta-language that allows to formally describe a design, its components and connectivity specifications. To separate connectivity configuration from its hard coded configuration inside the integrator an XML Schema Component was developed. Fig. 3 shows an example of complex types of the XML Schema Component. <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:complexType name="portGroupType"> <xs:sequence> <xs:element name = "portName" type = "nameClass"/> <xs:element name= "portDirection" type = "directionClass"/> <xs:element name= "portSize" type = "portSizeClass"/> <xs:element name= "portConnect" type = "portConnectClass"/> <xs:element name= "portCondition" type = "portConditionClass"/> <xs:element name="portConnectToDevice" type = "deviceEnumClass"/> <xs:element name = "portConnectToPort" type = "nameClass"/> </xs:sequence> <xs:attribute name = "id" type = "xs:integer"/> </xs:complexType> <xs:complexType name = "basePortClass"> <xs:sequence> <xs:element name = "basePort" type = "portGroupType" maxOccurs = "unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name = "basePortGroup"> <xs:sequence> <xs:element name = "baseGroup" type = "basePortClass" maxOccurs = "1"/> </xs:sequence> </xs:complexType> <xs:complexType name = "derivedPortGroup"> <xs:sequence> <xs:element name = "derivedPort" type = "portGroupType" maxOccurs = "unbounded"/> </xs:sequence> <xs:attribute name = "device" type = "deviceEnumClass"/> </xs:complexType> <xs:complexType name = "derivedPortClass"> <xs:complexContent> <xs:extension base = "basePortGroup"> <xs:sequence> <xs:element name = "derivedGroup" type = "derivedPortGroup" maxOccurs = "2" minOccurs="0"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> <xs:element name = "devicePortGroup" type = "derivedPortClass"/> </xs:schema> Fig. 3 XML Schema Component The Schema represents a formal structure skeleton for IP connectivity configurations and their constraints. A concrete device connectivity or device connectivity content has to be conformed to the Schema Component. There are several advantages of using such formal XML Schema representations. XML Schema Component is an abstract structural representation of any device-to-device connections, and it doesn’t depend on specific device connectivity requirements that might exist or arise in the future. The Schema can be easily extended for new connectivity requirements. The validity check between the Schema and its content is fully automated. Once the Schema is defined, any user content specification errors will be triggered immediately. Last, but not least, an easy documentation generation. There are well defined transformations using style sheets that can transform XML files into HTML or PDF for viewing. XML support in the integrator has been implemented by linking it with an open source Xerces C++ XML parser API libraries [2]. An XML Schema Design was developed to formalize input specifications. In Fig. 4 is an instance of the Schema - input specification in XML format which describes the reused block connectivity mentioned above. <?xml version="1.0" encoding="UTF-8"?> <inputDesign xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"> <ctrlGroup> <ctrlDevice id="0"> <ctrlInst>chip.u_ctrl0</ctrlInst> <ctrlDesign>CTRL0</ctrlDesign> </ctrlDevice> <ctrlDevice id="1"> <ctrlInst>chip.u_ctrl1</ctrlInst> <ctrlDesign>CTRL1</ctrlDesign> </ctrlDevice> </ctrlGroup> <componentGroup> <!-- GROUP 0 --> <groupDevice id="0"> <groupId>0</groupId> <memInst>chip.u_a.u_bank0.PDP</memInst> <wrapInst>chip.u_a.u_bank0.wPDP</wrapInst> <memoryDesign>sp256x36</memoryDesign> </groupDevice> <groupDevice id="1"> <groupId>0</groupId> <memInst>chip.u_a.u_bank1.PDP</memInst> <wrapInst>chip.u_a.u_bank1.wPDP</wrapInst> <memoryDesign>sp64x30</memoryDesign> </groupDevice> <!-- GROUP 1 --> <groupDevice id="2"> <groupId>1</groupId> <memInst>chip.u_b.u_bank0.PDP</memInst> <wrapInst>chip.u_b.u_bank0.wPDP</wrapInst> <memoryDesign>sp256x36</memoryDesign> </groupDevice> <groupDevice id="3"> <groupId>1</groupId> <memInst>chip.u_b.u_bank1.PDP</memInst> <wrapInst>chip.u_b.u_bank1.wPDP</wrapInst> <memoryDesign>sp64x30</memoryDesign> </groupDevice> </componentGroup> </inputDesign> Fig. 4 XML Design 5. Reused Module Connectivity Shared module support introduces another challenge in IP component placement and the connectivity process by accounting for reused blocks and by reusing previously created wires. Fig. 5 depicts two shared test cache memory modules or wrappers reused in two controller-memory blocks. Each of such blocks employs internal memory reuse. Connectivity between wrappers and memories is a straightforward port-to-port connection and is of no interest from the reuse point of view. Fig. 5 Reused Block Connectivity Controller hierarchical names are defined in the input specification file. Two controller devices are placed into design by the integrator, and the connectivity between test memory wrapper and controller ports are defined by the connectivity configuration. The task of the integrator is to place the controllers and wire them down to memory wrappers taking into account the design hierarchical information and block reuse. Logically, there is only one copy of the memory wrapper module, WRAM, instantiated two times inside the design, CACHES_RAM, that is being shared by both blocks. Physically, there are 4 memory wrapper instances that have to be properly connected in according to memory-controller connectivity specifications. For simplicity, only few ports/signals are shown in the figure which represent three types of connections, broadcast (PAUSE), bus (DONE) and serial (SI/SO). Connectivity for those ports differs a lot and depends on how memories or memory blocks are being reused. When dealing with broadcast signals a care must be taken to correctly reuse wires in the design, CACHES_RAM, and further route them to controllers. Wire numbers are introduced for better tracing reused connections. The integrator also appends port names to newly created wire names. For example, two different PAUSE wires (18 and 40) are created at controller level to broadcast the signals to shared memory blocks. Bus connections are handled at both levels, inside the module, CACHES_RAM, by creating a set of two wires (16 and 26), and at controller level with creating another 4 wires (16, 26, 28 and 30). Serial connections (SI/SO) are handled a bit differently. There is a local connectivity reuse inside the module, CACHES_RAM with creating three wires (2, 4 and 6), and a global connectivity reuse at the first controller level by creating wires (2 and 6), and at the second controller level by creating wires (8 and 10). 6. Experimental Results The chip integrator tool has been implemented in C++ and was tested on different Verilog RTL designs with extensive V2K syntax. The design blocks ranged from 250,000 to 500,000 instances and had up to half of million lines of Verilog. The tool dumps only the modified design files and copies the rest of the Verilog files. Its processing time depends on design hierarchy with a slight time increase for designs having large gate-level netlist blocks. For large hierarchical gate-level designs caching data of most expensive calls allowed to achieve up to 500x performance improvement. On average, all design blocks were completed in about 1 to 10 minutes on a Linux 32-bit machine, with 50% of the time for parsing design files and the other 50% of the time for IP placement and connectivity. Another 40% performance increase can be achieved by compiling and running the integrator on a 64-bit linux machine. The chip integrator has also been tested on an ARM processor design. For such purposes a test RTL 4 Cortex-R4 CPU [3] design was assembled and translated into a hierarchical gate-level design. 100 memories to be processed by the integrator were defined in the input specification file. The memories were partitioned into 4 groups each having its own controller. Each group comprised of memories of 4 different types/sizes. The integrator had to place test wrappers around the memories, connect wrappers to memories and wire all wrappers up to controllers. The overall design with 5 hierarchical levels and 550,000 instances was integrated in 10 minutes on a linux 32-bit machine. 7. Conclusion The tool had been tested on few industry real-life designs and allowed to accelerate chip-level integration and connectivity processes cutting the system integration from weeks to a few hours. References [1] The SPIRIT Consortium [2] Xerces C++ API Documentation [3] ARM Products & Solutions, Cortex Family
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |