Physical Design for Reuse Strategies and Implementation
Mountain View, CA USA
Abstract
Techniques for IP reuse have become commonplace in the RTL design world. By contrast, physical design for reuse remains stuck at delivering restrictive “hard IP.”
What is holding reuse-design back for physical designers is that physical design intent is captured at a very low level that has not changed in 20 years. This makes reuse from physical design point of view nearly impossible.
This paper describes a methodology for capturing both the structure (floorplan, macro placement, and pre-routes) and construction recipe of a physical design in a reusable form. The methodology can be applied to macro blocks as well as platform SoCs. The reuse of a DDR block is given as an example, showing that the effort to redo the 32-bit DDR controller as a 16-bit relational design was about 75 percent of the effort of creating the original DDR design.
Platform SoC requires new physical design approach.
A growing trend in chip design is to use a “platform chip,” which leverages a majority of a previous design (i.e. the platform) while making some alterations. The chip construction recipe, codified in tool scripts, can for the most part only be run by the original author. This creates a bottleneck since there is no such think as “designer interoperability.” Every SoC team has key people who solve tricky physical design problems. Their solution to any particularly nasty issue is usually captured in tool-specific scripts that can only be used by the engineer who wrote them. Conclusion: no surprise there is little reuse in physical design.
Bottleneck: Low-level design intent
The current state of the art for capturing physical design intent is much like trying to specify the control logic for a complex SoC with a schematic capture station. Today, physical design intent is captured in the form
of a floorplan and tool scripts (whose execution is managed by Make.) This low level of design abstraction has remained unchanged for 20 years. Physical design teams face growing design complexity that is not only driven by the addition of nanometer verification steps (SI, IR, EM, etc.), but also by the multiple place and route regions in hierarchical chips. Intel reports that the number of script commands needed to implement a Pentium® class design far exceeds the lines of C-code for their proprietary place & route tools that build these chips. The key point is that while place and route tool capacity has grown, design intent specification to drive those tools is running out of steam.
Solution: Two new design automation technologies
Two new design automation technologies are beginning to appear in the market: relational physical design and second-generation design flow automation.
Relational physical design.
In traditional physical design, absolute coordinates specify an object’s location. Modern floorplanners are making use of relational physical design, in which physical structures are created and placed at positions that are in relation to other structures or user-defined points in the design. All structures-- pads, power grid, macros, standard cells, and the place and route blocks are specified relative to one another. Most importantly, objects in different physical hierarchies (place and route regions) may be placed relationally to each other (“cross-hierarchical” relational physical design).
Floorplan resiliency is what makes relational physical design excellent for reuse. Figure 1 on page 2 illustrates the power of the approach. Here the same soft block is being used in two different SoC designs. The left side of the figure shows the previously pictured place and route block used in the original SoC. In this initial SoC design, this soft block was floorplanned into the lower left corner of the chip. The actual location of the soft block is determined relative to the lower left corner of the core chip area.
Figure 1. A soft block conforms to a new full-chip context in an optimal way when a relational physical design methodology is used.
The right side of Figure 1 shows how the original soft block could be reused in a new design. In this example, due to a new pad ring specification, the original block needs to be floorplanned in the upper left corner of the new chip. The pads that require special pre-routes to the DAC have also moved and therefore the DAC’s position in the block must change. The addition of new IP to the SoC means that the optimal die size may be obtained only if this block gets a more vertical aspect ratio. Since the original design was created relationally, the block size can be easily changed, and the original floorplanning, pre-routes, blockages, and power design are still valid.
Second-generation flow automation.
First-generation flow automation is tool scripting, managed by the ubiquitous UNIX make utility. Second-generation flow automation raises the flow specification from individual tool commands (those that perform all the steps that prepare the data, run placement, etc.) to a series of high-level physical design steps called stages (e.g. “place block A” that executes all the steps to prepare the data, run placement command, do error checking, save the data, etc.). Stages are elaborated with the design’s floorplan, netlist, technology files, and tool settings of interest to generate the hundreds of thousands of tool commands necessary to build a full-chip SoC (see Figure 2).
Figure 2: Second-generation flow automation generates hundreds of thousands of tool commands at runtime from high-level flow specification, and the design’s floorplan, netlist, constraints and tool settings.
Stages are comprised of actions that generate the command files and run the tools. Each action consists of several sections, the lowest level of control for the underlying tools. Sections represent individual commands or groups of commands in a P&R tool. See Figure 3.
Figure 3: Example of a high-level flow design stage, Place. A stage is composed of a series of actions. Each action in turn is composed of a series of sections, which are tool command templates that get elaborated at runtime.
Using second-generation flow automation, designers specify each block with its own block floorplan (specified relationally) and construction recipe or “sub-flow.” A sub-flow is specified quickly by connecting stages, much like putting Lego® blocks together. The designer then optimizes each block sub-flow by tuning tool variables for each block in context of the full-chip. All hand edits or ECOs are codified by the designer using “ECO stages.” To construct the full chip, each sub-flow is elaborated at run time, and tool commands are executed automatically to build the chip of interest. The design team will optimize the chip/block floorplans, and each sub-flow, to create the optimal design. The floorplan and construction recipe are therefore captured in a reusable way.
DDR Reuse example
Interface circuitry (e.g., SerDes, DDR, and LVDS) blocks usually require manual intervention to place critical standard cells and create routes to these cells. Some design teams may attempt to reuse such handcrafted designs as hard blocks; others have to spend as much time creating a new block as they did creating the original block.
Recently a design team for a top-ten IDM recently developed a DDR memory controller as a soft IP block. The challenge was to use the block in many different designs using various configurations of the DDR controller. The original DDR implementation was a 32-bit configuration with four DLL macros. The challenge was in meeting the very difficult timing requirements between the I/O cells, I/O related standard cell logic, and the DLLs. The first SoC project to use this controller did not achieve timing closure on the DDR controller due to schedule constraints.
The second version of the SoC achieved timing closure, but only with a full-time engineer assigned to the DDR controller. Extensive netlist modifications were required on the incoming DDR netlist in order to generate a version of the design that was physically correct in the context of the SoC. A simple example would be the grouping and buffering of I/O related logic based on the SoC’s pad ordering. This netlist modification also renamed components in a standard way so that the downstream design activities would get consistent names, further enabling reuse of the soft IP in new designs.
For this implementation, the edge logic (logic associated with an I/O pad) placement was specified relationally. In this way, timing-critical standard cells were automatically placed in regions adjacent to their associated I/O cells. Since these are relational placements, these edge logic regions would always exist next to the target I/O cell, even if the I/O cell moved due to changes in pad ordering. In both the first and second use of the DDR controller, however, traditional handcrafted place and route techniques were applied to most of the difficult I/O-related timing path constructs.
The third SoC to use the DDR controller required a 16-bit version of the design that only utilized two of the DLLs. In this case, relational physical design techniques were applied to the controller. Each DLL was placed relative to its dedicated power pads in the I/O ring. Edge logic -- standard cells that needed to be near certain I/O cells -- was placed relative to the respective I/O cells so that if the cell ordering changed, the edge logic moved accordingly. Custom buffer trees between the DLLs and the critical flops were also placed relationally. Even custom power rings that were required around the DLLs were always specified using relational techniques. Leveraging the layout strategy of the second version, the cost to design and implement this reusable DDR design was 75% of the second version.
Figure 4 shows the evolution of reusability starting with the second use of the DDR controller (DDR A). The amount of effort required to redo the 32-bit DDR controller as a 16-bit relational design (DDR B) was about 75 percent of the effort spent on creating the original DDR design. The investment is starting to pay off, as the relational DDR design is being used in new SoCs, with an estimated 25% of the effort of the high performance design.
Figure 4: This figure shows the progression of the DDR block from its first use, as a 32-bit DDR, to second use, 16-bit DDR. Now this IDM has a DDR that is in a reusable form for next-generation platform chips.
Conclusion
Physical design for reuse is now coming of age, enabling design teams to save both time and development dollars. Physically hardening soft blocks to reuse them in new SoCs is one way to achieve physical design reuse, but this approach has many limitations and will not produce the most optimal chip design.
A new generation of chip-level design automation tools, which support full relational physical design and second-generation flow automation, enable soft block optimization in the context of the full chip. Designers can now create not only reusable physical soft blocks, but also optimal SoCs that use those blocks. If this approach is taken, design teams can expect to spend 25% of the original effort to reuse the block.
Related Articles
- Strategies for minimizing Xilinx implementation tool runtimes
- Beyond DDR2 400: Physical Implementation Challenges in Your SoC Design
- Rising respins and need for re-evaluation of chip design strategies
- Understanding the Importance of Prerequisites in the VLSI Physical Design Stage
- M31 on the Specification and Development of MIPI Physical Layer
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
E-mail This Article | Printer-Friendly Page |