Maximize CPU power for physical verification
Maximize CPU power for physical verification
By Colin Stewart, EEdesign
June 13, 2003 (8:09 p.m. EST)
URL: http://www.eetimes.com/story/OEG20030613S0034
A pivotal point has arrived for physical verification with the introduction 0.13um technology. Shrinking line widths mean that a greater number of devices can be densely packed in to the same die area as before, and the number of metal routing layers has increased from six to nine layers. Smaller device features have meant that process tolerances required to fabricate ICs have increased dramatically, and fabrication processes have become more complex. At the same time as device complexity is increasing, die areas are also getting larger in order to incorporate the increased functionality that comes with more advanced technology. The net result for physical verification is that larger layout databases need to be checked, with an increased number of more complex rules in the same aggressive project time frames as before. This article highlights some of the industry problems associated with verifying large ICs and discusses some of the software and machine requirements that are now needed to debug and tape out efficiently. Physical Verification Software An increase to 100 million transistor device counts for 0.13um designs has meant that the volume of data needing to be checked has at least doubled compared with 0.18um. Verifying this amount of data puts a strain on the software and machine infrastructure in a design center. The following points highlight some of the most important things to consider. 64 bit software 64-bit software is now mandatory for large 0.13um designs. This is because the GDSII database sizes are now routinely higher than the 32 bit file size limit, and job process sizes can easily reach in excess of 10 Gbytes. This means that the GDSII creation and physical verification tools must both be compiled to run on 64 bit machines. At the moment, this realistically means running on a platform such as Solaris. However, the introduction of stable Linux 64 bit software in the near future will make it possible to vastly accelerate big database run times because of its superior clock speed. Solaris versus Linux The machine's operating frequency is a very important consideration because the faster the machine, the quicker the debugging can be completed. Solaris platforms have historically been used to run EDA software because the operating system is powerful enough to handle large amounts of design data across a number of users. With the introduction of Linux now running on high performance PCs, the dominance of Solaris platforms in EDA is about to be challenged. The fastest clock speed for a top end Solaris machine is 900MHz, whereas a PC running Linux can run at speeds in excess of 2 GHz. Hence, the advantage in processing speed is very significant. However, the drawback with Linux, and especially Linux compiled software, is that it is only really released for 32 bit operating systems. Massive physical verification jobs requiring 64 bit processing stil l need to be channeled towards Solaris platforms. Typically on a project this will mean that only the smaller jobs can run on Linux. However, this is still advantageous considering that the overall job queue for Solaris machines will be reduced. Stable commercial releases of Linux 64 bit should start to appear during late 2003. At that point, if the cost of 64 bit PCs reduces and EDA tool vendors develop Linux 64 bit software, then the use of Linux will become more popular. Hierarchical processing Hierarchical processing enables the tool to recognize multiply occurring instances throughout the layout database, and checks each instance once, as opposed to checking all instances concurrently. Early verification tools were not able to exploit natural design hierarchy and verified database in a flat manner. Because all cells were checked, no matter how often they were repeated, run times were extremely slow. By using hierarchical processing, the amount of data that needs che cked is reduced together with machine process and memory requirements. When one considers that some blocks may contain over 100 memory instances of the same type, then this technique becomes instantly attractive. Although hierarchical processing can now be done automatically by today's verification tools, it is common to define instance invariant cells, or hierarchical cells (HCells), in a file for the tool to reference. When errors are detected, the structure of the report file is different compared to a flat run, and errors are reported once for each HCell (the total number of errors can appear to be lower than they actually are).
Another advantage to using hierarchical processing is that because the design is partitioned into smaller sections, debugging is made clearer. Indeed, it is normally useful to define blocks as HCells. Errors that originate from within HCells are easier to identify, and if some cells are known to be clean then they can be excluded or black boxed from any subsequent re-spins, hence reducing debug times further.
Multi-threaded processing
Being able to fragment a physical verification job into sections that can get processed in parallel is another extremely effective technique to accelerate run times. Modern multi-processor platforms lend themselves particularly well to this type of processing. After the tool reads in the GDSII, the layout can be sectioned into as many different parts as there are processors available and any checks that can run in parallel are able to do so.
Checks that lend themselves most naturally to parallel processing are DRC, ANT, and LVS Extraction. Scalability in terms of run time reduction versus the number of processors used must also be good; if 2 processors are used, then the run time should be twice as fast compared to running on a single processor.
Because run times are significantly improved, then tool manuf acturers are quick to cash in with the licensing requirements by increasing the number of licenses required for multi-threaded jobs. However, if a company has design centers in different geographical locations, then licenses can be shared over a wide area network, should the tool vendor be reasonable enough to allow it! This does not ease the strain of buying licenses, but it does offer access to a greater number of licenses, and therefore faster debugging, during a critical tape out phase.
Using RAM
Old physical verification tools used to read and write data to the machine's hard drive, and the process of transferring data to and from the disk started to become a bottleneck as chips increased in size. To achieve the fastest run times nowadays, it is best to run the entire process in physical RAM. The latest tools do operate in this way but they require a machine with large quantities o f RAM to fully leverage this feature.
For 0.13um chips, machines with at least 16 Gbytes are required to do full database checks, and this is because process sizes can routinely climb above this level. Should a verification job be running on a machine that is also being used to run other jobs, and the total process size of all jobs breach the RAM limit on the machine. Then "swapping" will occur and all jobs will come to a grinding halt.
"Swapping" occurs when chunks of RAM are paged to disk in order to free virtual memory, and the read/write times incurred slow everything down. Hence, it's very important to evaluate the PV job size and make sure that the machine has enough RAM to accommodate the job.
Machine Infrastructure
Physical verification software demands compute power to achieve maximum performance and help the designer meet tight tape out deadlines. Compute power means having access to machines with enough performance and capacity to turn around jobs as fast as possible . The trouble is, all nice things with high performance tend to come with a high cost, which then needs to be financially justified and shared with other users to ensure full utilization.
In the past, smaller job sizes meant that the designer could get away with running on their desktop Sun Ultra, or, when a little extra memory or speed was required, the designer could search the network for a machine that was quiet and had the requirements. This method usually involved having some local knowledge as to what machines were available and, most of the time, friends were made and influenced with people who were already on the desired machine.
To manage job processing, and to ensure better machine utilization, compute farms and job submission software are now used, such as LSF (Load Sharing Facility) from Platform Computing. All processors and memory resources are linked together over a single network, and the designer submits their job to a queue. Depending on the job requirements and its priority, i t is then farmed out to the most suitable machines.
A DRC check on a 5 Gbyte GDSII database will need a 64 bit platform, such as a Solaris machine, with at least 4 processors, large amounts of RAM and a fast clock. Therefore, it makes sense to submit this type of job to a group of machines that have these statistics. Medium sized jobs, say for performing DRC on GDSII databases around 1 Gbyte, could be ran on "lesser" Solaris machines with fewer processors, less memory and a smaller clock speed. The smaller jobs, where the run database would not exceed 32-bit processing, could be channeled towards Linux machines.
Because most other EDA software runs on a single processor, the submission of multi-processor "turbo" jobs does require some tuning of the LSF software. One of the most important settings is job slot reservation. If a 4 processor job has been submitted to queue for an 8 processor machine, then as single processor jobs finish, LSF must begin reserving the 4 job slots required.
If job slot reservation is not activated, then single processor jobs queued after the turbo mode job would jump the queue and steal the single job slot that has just been freed. This type of queuing is most suited to larger, less urgent jobs that can be run overnight when many of the farm processors are quiet.
Another good LSF technique is to group a couple of "preemptive" multi-processor machines that can be used specifically for turbo mode jobs. If the only other jobs that are allowed on the "preemptive" queue are very low priority, then these jobs can be suspended when a verification job is submitted and can resume after its completion. The advantage to this method is that it offers better responsiveness and is most suited to running jobs throughout the working day.
Conclusion
To achieve aggressive tape out schedules for large 0.13um designs, consideration must be given to the cap acity and performance of the machine infrastructure together with the type of physical verification software and how it's used. Physical verification software must run fast and provide easily decipherable results. Debug times can be improved further if a combination of the techniques described above are used.
Hierarchical verification features are very important, not only to minimize the amount of data to check, but also to make debug easier through the use of hierarchical cells and black boxing techniques. To fully leverage a tool's potential, top end Solaris 64 bit multi-processor platforms are needed so that turbo mode runs can be performed. The most efficient way to utilize these machines is to access them via a Platform style LSF facility, either through preemptive or job slot reservation queues.
Linux platforms are offering significant speed advantages over Solaris. However, their use is currently limited to 32-bit applications. As technology progresses towards 90nm, software and machine c onsiderations will play an even more important role in physical verification as the database sizes grow larger still. No longer will it be necessary to just know how to kick a run off and debug some results, but it will also be more crucial to know how to squeeze the best out of the tool considering the resources available and schedule.
Colin Stewart is a senior consulting engineer at Cadence Design Foundry UK Ltd., a division of Cadence Design Systems, Inc., in Livingston, Scotland. Colin's primary focus is physical verification of nanometer technology ASIC designs.