Of the more than 1,000 IC projects reviewed in a March study by Numetrics Inc. (Cupertino, Calif.), 85 percent missed their target delivery date. Just as alarming, the average project overran its schedule by a factor of 53 percent. Numbers like these expose an ugly truth: System-on-chip integration is nowhere close to being predictable for most teams. The drive to build ever-bigger chips with their attendant submicron effects has left physical designers battling too much complexity and juggling too many unknowns. "Back-end" design has become the wild card in the SoC deck. Even the most heroic efforts to manage the problem can still end in suboptimal performance and manufacturing delays. Need more evidence? PRTM (Mountain View, Calif.) has studied project slips at a number of semiconductor companies and found that the single largest factor driving them is "unanticipated technical difficulty." Such setbacks are implicated in 46.8 percent of the cases reviewed. One of the most striking aspects of the SoC world is that when it comes to full-chip verification, the practices of register-transfer-level designers and physical designers are diametrically opposed. What is gospel for one is heresy to the other. The reality of RTL design is that front-end teams verify full-chip function at least three to four times as often as any individual RTL component. The more complex the chip, the higher the ratio becomes. Yet, in physical design, the reverse holds true: successful integration of the chip-its entire reason for being-is verified as rarely as possible. And it is the most complex designs that are tested the latest and the least. Inevitably, it is only at full-chip integration and verification that the dreaded "unanticipated technical issues" start cropping up, and project slips occur. This inverted perspective on verification practice has largely been forced upon physical design teams for a simple and highly compelling reason: full-chip builds take way too long and are way too difficult. It may take two or more weeks to rebuild an SoC from a moderately changed full-chip gate-level netlist to a physical implementation that is ready for verification-100s of times longer than it takes to compile Verilog for simulation, or compile-relink C-code. Facing down this 100x compilation wall is a fact of life for physical design managers, and it carries a double whammy. Driven by the belief that full-chip iterations have to be avoided due to their time cost, physical teams design the SoC as a collection of independent blocks. This "design by construction" approach demands that they accept defeat before the project starts: suboptimal floor plans to mitigate cross-block interactions and minimize interblock timing issues. Yet in so doing, they also ensure that the blocks are not optimized in the context of the full chip and may be wired with routing channels, resulting in die sizes that are 10 percent to 20 percent larger, and lower performance due to global-timing issues. Worse, while the "design by construction" workaround may appear to let physical design engineers continue to live in the world of blocks and confine their build iterations to individual blocks, even that is quickly becoming a mirage. With today's highly complex SoCs, the final reckoning of full-chip verification can only be postponed, not avoided. At best, the first tapeout is delayed as the team copes with "unanticipated technical issues." At worst, they end up paying the piper when their designs come back for a re-spin-if for no other reason than to fix timing problems or reduce the cost of the chip by squeezing the die size. The obvious-if not only-way to escape this trap is to speed up the design-decision-to-GDSII verification loop. The present state of the art in physical design automation is the "tool-script-make-e-mail-alert" methodology. It's an approach that any engineer packing a Blackberry knows is highly labor-intensive. Designers spend much of their time determining the next steps in the process, finding and preparing the right data, monitoring script progress, correcting errors, rerunning jobs, storing resulting data in the right place, telling another person on their team that the data is available, etc. Few would argue with the idea that this labor could be more productively applied to design optimization rather than design task management. Next-gen automation The key to solving all these problems lies in truly automating a time-proven methodology: hierarchical physical design. Most complex SoCs today start by breaking a chip into blocks. It then becomes possible for many engineers to work on a design simultaneously by designing constituent blocks on multiple computers running in parallel. Originally developed as a technique for extending the capacity of design automation tools, today this approach is used to achieve shorter tool run-times, implement multiple power islands to lower power consumption, allow and confine design changes to a single area to accommodate late-breaking engineering change orders. As conventionally applied, hierarchical physical design requires the execution of hundreds of tools, each of which has to be launched by an engineer-a significant factor behind the 14-day-or-more build times. By introducing new automation technology, these labor-intensive (not engineering-intensive) tasks can be performed by a program, producing dramatic improvements in speed. The methodology then can be used to support fast chip-level integration cycles, allowing teams to start full-chip verification early and continue to test their design decisions throughout the physical design process. Chip-level design automation, or CLDA, has the capability to fill this need. Core technologies behind CLDA are flow elaboration and a persistent flow-execution server. Flow elaboration generates all the tool commands to construct an SoC based on a specific floor plan, netlist, technology library, tool flow and tool settings. The persistent server manages the execution of these commands to ensure full-chip build completion with a minimum of human intervention. The combination of these two technologies enables the GDSII construction of even the largest SoC in less than 24 hours (thanks in part to low-cost, high-performance Linux servers running to construct multiple place and route blocks in parallel). Old assumptions This new automation smashes the assumption that hierarchical SoCs take four to six weeks to construct in the first place, and 10 or more days to rebuild. In addition to restoring the integration predictability required to prevent manufacturing failures and automating away the drudgery involved in tool-launching tasks, CLDA gives physical design teams the power to explore design options early in the design cycle and to detect full-chip problems through the entire (early and late) design project, so schedule predictability becomes achievable. Examples of improved outcomes include a 10 million-gate SoC that was derived from a "platform SOC," reimplemented in the form of 10 place and route blocks with new functionality. The first full-chip design iteration through to P&R was completed in the second week of the project (most don't see full-chip assembly until week 20). The power grid, a full-chip resource, was under-designed on the prior version, resulting in a serious IR (current x resistance) drop issue that prevented the chip from hitting full performance. On this derivative project, the new power grid was verified early with AstroRail. It was necessary to add more power pins to the pad ring. This addition would have been impossible to implement had the IR analysis been performed in the last week before tapeout. Instead, full-chip GDSII verification occurred weekly through the entire design process, giving management excellent visibility on the issues facing the team. This chip taped out to the original schedule. Full-chip integration and verification remains the only reliable method for detecting and defusing full-chip design issues, such as full-chip timing, signal integrity and power grid integrity. Newly arriving chip-level design automation abolishes the notion that full-chip verification to GDSII can only be performed late in the design cycle, when inevitable "unknown technical difficulties" appear. It places physical design and RTL teams on the same playing field to put to rest the single biggest reason that so many SoC designs routinely slip and fail. Lane Albanese (lane@reshape.com) is director of design consulting at ReShape Inc. (Mountain View, Calif.). See related chart |