How to organize a complex SoC project

How to organize a complex SoC project
By Jason Flood, EEdesign
December 6, 2002 (5:04 p.m. EST)
URL: http://www.eetimes.com/story/OEG20021206S0039

It's 9 p.m. after a busy day. Across the room, a business manager and a project manager are working with a customer, via every medium at their disposal, trying to close the next big project. Everything has been prepared. I know this, because I've seen people running in and out with all sorts of documents for the last couple of weeks. I sit comfortably at my desk, safe in the knowledge that this project has nothing to do with me, whatsoever. I even know that this project has already been assigned a full-time technical leader. I can just sit back and worry about my own project.

Three weeks down the line a string of discoveries about incomplete deliverables, time-to-market windows, scope changes, and tapeout dates conspire, and I find myself joining the enlarged team selected for the "new and more challenging" project. I arrive at work the next day convinced that this new project would provide more interesting challenges than my own, and sit down with the technical leader to find out what the story is.

It's looking more challenging already. This chip is a 0.18um to 130nm, netlist to GDSII, cost-reduction port of an existing design with a few modifications. The catch? The timing data we need from the 0.18um device in order to sign-off on the 130nm variant is missing, along with the key members of the customer team that designed and implemented the original version of this highly complex SoC.

Our basic mission is clear, and we set up two teams of engineers. Team One will take what data is available for the 0.18um device and reverse-engineer the timing sign-off criteria. The team is doing this as it is told the 0.18um silicon works, but the design did not meet all the formally defined timing constraints at timing sign-off. By running static timing analysis (STA) on the original, we can measure and set the numerous performance targets for various parts of the 130nm device. Team Two, on the other hand, gets the more conventional task of taking the 0.1 8um netlists and performing the 130nm port.

Team One has its work cut out. It is experiencing lots of technical hurdles with the re-timing work. There are all sorts of trials going on with Primetime, Pearl, and PKS. Translators are being written and correlation exercises are being performed. Engineers from Team Two are already pushing for block-level timing constraints to drive the place-and-route flow. Conscious of the very aggressive tapeout date ahead, everyone is pulling together.

It is tough. We've just been informed that some of the most critical pieces of IP will be delivered very late. So late that anything unexpected with these deliveries could slip the tapeout date by weeks. Ordinarily, sufficient verification of all aspects of third-party IP would be performed off the critical path. In this case, serious risk was piling up directly upon it.

This type of scenario is familiar to many of us. In this case, changes to the project continued right up until tapeout. So how can your team organ ize and execute an SoC project like this to cope with the inevitable change? You'll need to consider:

How to plan the project.

How to deal with technical issues.

How to deal with productivity requirements for time-to-market windows.

Planning the Project
It is often said that two of the most important factors in the success of an electronic design outsourcing company is its ability to deliver a product on schedule and to achieve first-silicon success. Good planning is viewed by most to be a key element for these aims. In my experience, good planning is a valuable way of agreeing on the basic rules of the game for all involved. However, I have yet to see a project executed to the letter of the original plan. The real skill is in adapting and managing the plan to cope with inevitable changes.

At Cadence Design Foundry, primary responsibility for initial scoping and planning a silicon implementation project typically lies with a technical leader rather than a project m anager or business manager. The rationale for this is simple. The technical leader responsible for the execution of the project is more likely to buy in to the plan if it has been put together by one of his peers. The planning process starts with an initial evaluation of the customer's expectations, as well as ours.

Customer expectations are captured via a Request For Quotation form (RFQ). This is essentially a list of questions on technology, geometry, foundry, tapeout date, library vendors, expected die size, and so forth. Often, some of the RFQ questions cannot be fully answered before a project is started, so additional working assumptions may need to be added to the plan and statement of work.

Generally, there is no guarantee of meeting any fixed customer requirements, such as die size or speed of operation. Experience tells us that there are far too many variables that can come into play during execution, making targets unrealistic in the time available. At first sight, this can be viewed as no n-committal, but if a customer is highly dependant on any key requirements, it means that the issue is brought to the fore and can be given more careful attention by both parties.

We start with a basic structure for the project plan that includes a series of customer responsibilities. The intent is to make it clear right up front what is expected of the customer and what they can expect from us. If there are any aspects of this plan the customer is unhappy with, the issues are raised so they can be given extra attention.

The basic structure for the project plan is a four-phase approach (figure 1); ramp-up, preliminary, stable, and final. The activities carried out in each phase are summarized in the typical customer engagement model diagram (figure 2).

Figure 1 - Project planning phases

Figure 2 - Customer engagem ent model

Ramp-up phase
During ramp-up, there are several objectives that we strive to complete before starting the main execution. Typically, ramp-up lasts around three weeks depending on the complexity of the project. The primary objective at this stage is to install and verify the design environment to reduce project risks before the entire project team is assigned. Third-party IP such as standard cells, I/Os, memories, analog IP, and foundry technology, often from a number of sources, must be carefully linked together and verified against the baseline flow and toolset.

The next most important objective during ramp-up is to work with the customer and IP providers to produce documents that will aid the main execution phases. Clock, design for test (DFT) and power strategies, floorplanning information, block diagrams, timing specification, and design rule manuals are the kind of information we must study or put in place. The biggest assumption made in the ramp-up phase is th at all library data is available at the start of the project; certainly before the end of the ramp-up. In reality, this is not always the case, and the consequence is additional risk to the project. To complete the ramp-up phase we hold an initial design review (IDR) so that any outstanding issues are highlighted to all parties involved before we enter the preliminary phase.

Preliminary phase
The primary objective of the preliminary phase is to perform a first-pass implementation and analysis of the customer's design. At this stage, we would expect to receive a representative netlist for the whole chip containing at least 90 percent of the final expected logic in the design.

Equally as important is a complete set of sign-off quality timing constraints. After some basic floorplanning and partitioning, the blocks and top level are pushed through the flow to generate information that can be fed back to the customer. This may include details of blocks with serious routing or timing issues. < P> The most common problems at this stage are issues with timing constraints. The aim is to provide enough feedback to enable any necessary changes to be made before the customer becomes too tied up with final functional verification. Typically, the preliminary phase lasts around four to six weeks and requires four engineers. It completes with a release to stable (RTS) design review with the customer that highlights progress and any outstanding issues.

Stable phase
Once the stable phase is entered, we would expect to receive a netlist that is very close to final. This would include more than 95 percent of final logic, and we would anticipate that the customer has completed a high percentage of its functional verification plan. It is also essential that serious issues with the netlist or timing constraints (with reference to physical implementation) highlighted during the preliminary phase have been addressed.

Over the eight or so weeks this phase lasts, we would perform thorough implement ation and verification activities from detailed floorplanning and power grid design, through to STA, signal integrity (SI) analysis, IR drop analysis, and full-chip physical verification. Although we would not necessarily aim for complete timing closure, the tools would be pushed to the point where hand-fixing would be a realistic option in the time allocated for the final phase.

A key planning point of the stable phase is that the project team will expand significantly to around eight to 12 engineers to cope with the detailed implementation and analysis activities of the blocks, top level, and full chip. The release to final design review completes the stable phase in a similar fashion to earlier reviews.

Final phase
The final phase is essentially the same as the stable phase, but progress is much faster. The main difference, of course, is that all of the verification tasks must be closed. As a result, extra time must be allocated for manual timing fixes, SI fixes, design rules checking (DRC), and antenna fixes. For designs at 130nm and below the extra time can be as much as six to eight weeks, bringing the typical duration of the final phase to twelve weeks.

Naturally, the netlists and constraints processed during this phase are expected to be final, but it is common for netlist changes to be incorporated. Anything but the most minor of changes may require a complete re-spin of a block, impacting the project's critical path. A release to manufacturing review (RTM) is held just before tapeout. Outstanding issues should be discouraged at this stage!

The project phases just described are typical of what the technical leader will put in place and are scaled appropriately depending on project complexity. Once all parties have agreed to the plan, the project manager takes ownership and is responsible for ensuring the execution takes place accordingly.

So what happens when things start diverging from the plan? A very thorough approach to project management is required. In addition t o a full time technical leader, a project manager will spend 50 percent of his time on day-to-day, week-to-week, and month-to-month tracking and management. Apart from status reporting and action logging, the risk management plan is used to detect and monitor significant issues.

The project manager will record the probability and impact (low, medium, or high) of issues that are not being resolved quickly. Causes, mitigations, and possible schedule hits are assessed and all parties can review the items to determine the course of action that fits best with customer priorities. The risk plan forces recognition and gives quantifiable evidence to aid the decision process. The aim is always to minimize project risk for first silicon success, but often this will add to the cost, schedule, or both.

An alternative is to pass more responsibility on to the customer. For instance, in cases where third-party IP is involved, the customer is paying the bills and can demand better support. If we take on the extra responsibility, the project manager has a well-equipped armory at his disposal: lots of engineers with a broad skill spectrum; an industry-leading compute infrastructure; truly infinite tools and licenses; direct access to Cadence customer support and product engineers; and close links with other tool and library vendors and foundries.

Technical Issues
By far the most common technical issues encountered on nanometer SoC projects are with libraries and other IP. Time-to-market pressures dictate that early adopters work with library and foundry data that may not be mature. With foundry design rules updating bi-monthly, IP providers can struggle to keep up. To make things worse, the highly competitive EDA industry can add uncertainty with regard to library support and compatibility.

Most engineers are aware of the five basic library views required for each cell in a library (Verilog, LEF, TLF/.LIB, GDSII, and CDL/SPICE). However, advanced toolsets dictated by 130nm-and-below designs mean set- up and verification of about 20 technology sub-directories for each project. One reason is that although many tools use the same basic library views (e.g., LEF) different tool vendors will have varying levels of support for a specific version of LEF. Often library files will need to be edited to make them compatible with a specific tool.

In addition, each tool will require some kind of configuration or other technology file that will require some verification. The aim is to complete this work during the ramp-up phase of a project, but often, some library elements will not be available, which adds extra risk to a project. But, while it is possible to create dummy views from other data, history tells us that the project is always compromised in some way -- usually by extra cost and occasionally by schedule slip. This is a burning issue, and we are actively improving processes to cope with the risky just-in-time approach when this is very important to the customer.

For the project discussed in this art icle, late delivery of a key IP block meant a large part of the chip timing could not be verified. The IP block was intended to run off dedicated power supplies, so there was good reason to think that a major part of the full-chip LVS debug was being overlooked in the final phase of the project.

For the timing problem, we were able to reuse an old 0.18um timing model of the block while waiting for the 130nm netlist. For the LVS debug problem, the project manager assigned extra resources to reduce the workload of the main physical verification engineer so nothing detracted from LVS once the required data arrived.

Neither of these solutions was based on rocket science, but early identification of the issues allowed us to take the necessary action to reduce the impact on the project timeline.

Two other major technical issues facing implementation teams are timing closure and SI. Every vendor producing physically knowledgeable implementation and optimization tools claims to have a solution that can close designs with a minimum of fuss. Each solution has unique benefits, but no matter how advanced the tools become, the latest foundry processes, libraries, and cutting-edge functional designs will always present scenarios that these tools alone cannot handle.

We find that the complex projects we deal with mostly fall into this category. Every block or top level will have a predictable level of manual work associated with timing and SI. For these portions of a design we have created many techniques to aid this work. Some involve scripted editing of DEF files in conjunction with STA or SI results, but the final push to closure invariably relies on hand-fixing by engineers. For more difficult portions of a design, we will dedicate a full-time engineer per block. To complement this, the engineer responsible for full-chip STA and SI analysis will be available to support the extra effort.

Occasionally a show-stopping situation on a design dictates extraordinary measures. First, we must assess whether the objective presented is actually possible. Sometimes this is a contentious issue, but the stakes are high for everyone concerned. The project team will attempt to find a creative solution by itself, but each week, project managers across the design center meet to discuss each other's main issues. In addition, we use a company-wide mail alias that is a very good source of informal help for designers. If the problem persists, escalation is rapid and management very quickly adds additional resources. That was the case for me on the project described above.

Productivity
Beside engineers, one of the biggest assets Cadence possesses is its silicon implementation design flow. This highly automated environment allows designers to quickly push blocks though the complete suite of implementation, analysis, and verification tools. The scripts that make up the flow have been adapted over many years to incorporate advances made by different teams and tools for the benefit of new projects. For engineers the benefit of the automated environment is that it frees up valuable time for them to focus on the real challenges of the design.

Although the advantages of the existing design flow are clear, one of the main criticisms made by our own designers is how difficult it has been to customize the flow for those who have not worked on its development. Small subsections are easily modified through individual scripts, but major changes are more of a headache. Customization is becoming increasingly important on today's most challenging designs. New tools and new features of existing tools must be quickly integrated into the design flow and work seamlessly if you are to stay ahead of the game.

Looking back over the last few years, it is clear that, without sustained focus on the topics discussed in this article, Cadence would not be competing in the upper end of the SoC implementation market. Sure, we would still be turning out devices, but there would not be as many. We would probably be focused on market-fo llowing designs. And we certainly would not have the 97 percent first silicon success that Cadence Design Foundry currently enjoys.

Oh yes, the chip I mentioned earlier taped out on time after a lot of blood, sweat, and tears, and was successfully delivered to the end-customer in time for them to begin system integration.

Jason Flood is a principal consulting engineer at Cadence Design Foundry UK Ltd., a division of Cadence Design Systems, Inc., in Livingston, Scotland. Flood's primary focus is silicon implementation of nanometer technology ASIC designs.

Industry Articles

How to organize a complex SoC project