|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Choosing the best Standard Cell Library without falling into the traps of traditional benchmarking methodsBy Andrea BONZO, CAE Libraries, Dolphin Integration Assessing the comparative performances of several Standard Cell Libraries in a reliable way is a tricky project as it deals with statistical issues. The methodology traditionally used in the industry to benchmark Standard Cell Libraries is the so-called “cell-by-cell” approach. It consists in taking one or two basic cells, such as a NAND2 and/or a FLIP-FLOP, and comparing their area, dynamic power consumption, leakage and speed. This method has three major drawbacks:
The objective of this paper is dual. The first objective is to demonstrate that the « cell-by-cell » approach to compare libraries is inconsistent with actual performances results obtained after P&R of libraries on a logic circuit. The second objective is to present benchmarks and methods to compare efficiently and reliably different libraries with different architectures (e.g. CCSL versus RCSL). The suggested benchmarks and methods are:
Each of these methods enables to compare area, leakage, dynamic power and speed of several Standard Cell Libraries with different accuracies. But the last two approaches also provide a comparison of the ease-of-use and time-for-convergence of the library. For reasons of protection of confidentiality, all the values given in this article are close to but not the exact values of a specific library. For more information on Sofia Benchmark: http://www.design-reuse.com/sip/view.php?id=22631 For more information on Thalie Benchmark: http://www.design-reuse.com/sip/view.php?id=22630 For more information on Motu Uta logic standard: http://www.design-reuse.com/sip/view.php?id=22468 From the « cell-by-cell » approach to SOFIA benchmark Comparing two standard cell libraries (e.g. a high density library with a general purpose library) in 0.18 µm with the NAND2 cell indicates that the total gain expected using the high density library is 12 % for the area, with a dynamic power consumption 12 % better compared to the general purpose library:
On actual cases (which means on logic blocks after P&R) using both libraries, the results show a larger gain in terms of area (around 35 – 45 %) with a gain in terms of dynamic power consumption of around 5 %. In a different illustration, if we compare a Reduced Cell Stem Library (RCSL) with a Complex Cell Stem Library (CCSL) using one FLIP-FLOP cell, what we obtain is a gain in terms of area of 45 % with a power consumption divided by 2!
These three examples demonstrate that the conclusions made from a simple cell-by-cell comparison give us an indication which can be wrong! For a better accuracy, the SOFIA benchmark uses 6 cells representative of the typical paths in a majority of logic circuits. Each cell is weighted depending on the percentage that it represents in the path, obtained from a large sample of circuits. These weights vary depending on the nature of library (the traditional CCSL approach, or the RCSL approach like SESAME from the Dolphin Integration offering). Area
Dynamic power consumption
The SOFIA benchmark provides an objective comparison at the pre-synthesis level of the performances of libraries (area, dynamic consumption, leakage, speed) in just 30 minutes. The results we show, and the experience we have on different logic blocks, underline that SOFIA provides an accurate comparison among libraries, which is not the case with the “cell-by-cell” approach. The “Thalie” formula to compare libraries on a target SoC In order to obtain a measurement of the performances of a given library on the User’s SoC, the Thalie formula is proposed. This formula enables the User to compute the area of a logic bloc starting from its complexity in terms of gates and the SOFIA benchmark. How to predict the performances of a logic block in terms of area The smallest silicon area achievable for a given design remains a question mark for the majority of designers. Let us name this smallest achievable area the “Asymptotically Reachable SoC Area” or “ARSA”. The actual reachable SoC Area will depend on the ARSA, but also on additional constraints (e.g. form factor) and the time budget allocated to the Place and Route. The Thalie formula is dedicated to the ARSA evaluation of a logic block. Thalie can estimate ARSA starting from various parameters describing the logic block (result of a logic synthesis, estimation of number of flip flops…). The accuracy of the estimation will depend on the accuracy of the input parameters Area Performance after P&R predicted starting from the SOFIA Benchmark The goal of this approach is to select the minimum asymptotically achievable SoC area achievable in P&R. The input parameters of Thalie are:
Based on input 1, the Thalie formula estimates the “Total cell area” after synthesis of the targeted logic block. This is done by using the distribution of the cells provided by the weight of SOFIA. Based on inputs 2 and 3, the Thalie formula estimates the area of the Clock tree. In fact, starting from the complexity of the logic block and the weight of the FlipFlop in a design, it is possible to estimate the number of FlipFlops in the design. With the area of the average buffer for the clock tree and the average fanout, it is possible to estimate the number of buffers to be used for the clock tree. In the same way, starting from the number of FlipFlops and the hold constraints, it is possible to estimate the number of cells to be added in order to correct all the hold violations during P&R. Based on inputs 4, 5 and 6, the Thalie formula estimates the number of nets which can be routed (available routable net) within the cells. In order to check if the routing can be completed successfully within the cells, the “available routable net” is compared to the actual number of nets to be routed for the target design and the final area of the logic block is finally computed. The table below shows an example of the Thalie implementation on the Motu Uta standard (see following chapter for the definition of Motu Uta):
Starting from the SOFIA, we computed the number of instances per cell type. Distribution for the 6 cells of SOFIA
With the number of instances per cell, we are able to compute the number of nets of the circuit after synthesis, which is equal to 82770 nets. With the number of FlipFlop, we anticipate the size of the clock tree and the size due to the hold violation corrections. In order to compute the available routable net, we need the information on the structure of the library and the metal Top of the SoC:
Finally, we compare the 82770 nets to be routed with the available routable net and we estimate the final ARSA of the circuit: in this case the ARSA is equal to 1.15 mm². This means that with a medium effort during P&R, we can achieve ARSA + 10 % in terms of area. The results we obtain with the Motu Uta after P&R is 1.26 mm², which corresponds to the 1.15 mm² + 10 %. With SOFIA and Thalie, it is possible to perform a fair comparison of the performances of two different libraries and assess the performances of a targeted SoC. The missing dimension of a comparison based on SOFIA and Thalie only is that the libraries are not compared in terms of ease-of-use and time-for-convergence during the four implementation steps of the logic flow: logic synthesis, placement, clock tree synthesis and routing. Motu Uta is a public logic standard (logic block in RTL), which can be downloaded for free from the Dolphin Integration website. The purpose is to enable benchmarking of performances of any Standard Cell Library by performing synthesis, placement, clock tree synthesis and routing based on the Red Benchmark. Thanks to its structure, Motu Uta is representative of typical logic blocks in all dimensions: area, power consumption and speed (for more information, see http://www.dolphin-ip.com/flip/sesame/benchmark/sesame_motuuta.php). The Red benchmark is a list of constraints providing all the needed information to set the constraints for Motu Uta through the 4 steps of logic flow: The third conclusion is that, through Motu Uta, the comparison between two libraries is not only made on electrical or physical performances (timings, power consumption or area) but also on the performances in terms of implementation (time to silicon, etc…). Benchmark on the targeted SoC through the Try & Compare With Motu Uta, the comparison between two different libraries of standard cells is made for all performances. Nonetheless, there are two cases in which the SoC integrator may wish to perform further verifications. The first case is for applications with performances which challenge a given library in terms of speed. It is then important to check that each library effectively meets the speed constraint of the targeted logic block. The second case is for very specific designs, with unusual distributions of standard cells, such as RTL code based exclusively on latches or asynchronous logic blocks. The “Try & Compare” is a structured methodology enabling to compare truly and efficiently the performances of standard cell libraries. The performances of any logic block depend on: the library, the benchmark and the SoC Integrator’s capability for floorplanning and optimizing the implementation of logic blocks using the P&R EDA solutions. The optimization rests on the implementation during the following four steps: synthesis, placement, clock tree synthesis and routing. For this purpose, the Try & Compare evaluation kit includes all the necessary library views to proceed to a performance assessment on any logic circuitry including the public logic standard Motu Uta (see above) together with scripts enabling a full optimization of the library usage at each implementation step:
Such scripts are optimized for a given library. Conclusion
For more information on Thalie Benchmark: http://www.design-reuse.com/sip/view.php?id=22630 For more information on Motu Uta logic standard: http://www.design-reuse.com/sip/view.php?id=22468
About the Author Andrea Bonzo serves as the Central Application Engineer for memories and standard cells for Dolphin Integration. He is in charge of the technical interface with prospects (before sales) and with customers (after sales). Prior to this, Mr. Bonzo was in charge of the development of analog IPs for 4 years before starting the activity in the field of the memory generators and later on the development of library of standard cells based on a Reduced set of cells.
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |