IP Gate Count Estimation Methodology during Micro-Architecture Phase

Vijay Kumar Kodavalla
Wipro Technologies, Bangalore, India

Abstract

In IP based SoC design era, it is highly desirable to have near-accurate gate count (area) estimates upfront during micro-architecture phase of IP development. This gate count estimate will enable SoC die size budgeting and floor-planning in very early stages of IP development. Also this gate count estimate can be one of the decision factors for considering design alternates. This paper presents challenges of gate count estimation during early architecture design phase along with effective methodology. This paper is backed up with vast experience of various IP designs with logic area up-to several hundreds of kilo gates with several hundred kilo bits of memory.

1. Introduction

The gate count estimation during architecture design phase is often time consuming, manual process and more prone to major errors due to lack of standard procedures and methodology.

However the exciting benefits of near accurate gate count (area) estimates of IP during detailed architecture phase are:

Design alternates: Design alternates can be considered early in the IP development cycle before starting RTL implementation
SoC die size budgeting: SoC physical design can start in parallel if other IP components are ready, though a particular IP is in microarchitecture phase, by allocating appropriate die size budget
Power estimates: Early gate count estimates will also enable power estimation, which is another crucial design decision factor for low power designs

Following are major challenges in early gate count estimates:

No RTL Code available: During microarchitecture phase, only paper design and/or C model may be available and no RTL code
Inadequate micro-architecture description: During architecture design phase, paper design description may not be adequate enough to have a detailed view into the design
Many design partitions: Multiple module level designers may be involved in IP development and gate count estimates depends on each engineers “depth of design view” of respective IP partition, as there is no standard methodology

Thus there is a need to derive a methodology for gate count estimation to mitigate effects of some of these challenges and meet the objective. This paper lists prerequisites for gate count estimation in section 2. The proposed Gate count Estimation Methodology [GEM] is presented in section 3. The limitations of this methodology are highlighted in section 4, followed by conclusions in section 5.

2. Prerequisites for Gate count estimation methodology

Following are the prerequisites for gate count estimation methodology:

Fairly detailed Micro-architecture description: The documented design should highlight various functions/components needed in the design such as adders, substractors, multiplers, dividers, shift registers, barrel shifters, FIFOs, RAMs, RAs, Multiplexers, De-multiplexers, Comparators, State machines etc. This process will become fairly easier if C model or Pseudo code is also available.
Target library details
Target frequency requirements
Approximate number of logic levels acceptable with target technology to meet the required frequency
Known standard blocks gate count numbers such as FIR/IIR filters etc f) Earlier similar complex projects gate count numbers, which can be used as reference

With these prerequisite conditions and information GEM can be applied.

3. Gate count estimation methodology [GEM]

As a first step of estimation methodology, following data should be collected by running synthesis of test components on target library:

Number of NAND2 gates needed
Propagation delay

Sr	No Component Name	Delay(ns)	NAND2
1	32 bit VariableComparator (2, 3, 4…..N bits)	5	200
2	32 bit FixedComparator (2, 3, 4…..N bits)	0.5	20
3	8 :1 Multiplexer (2, 3,4 …..N:1)	0.3	15
4	18 bit Adder (2, 3, 4…..N bits)	2.5	125
5	18 bit SignedMultiplication (2, 3, 4…..N bits)	6	2200
6	18 bit UnsignedMultiplication (2, 3, 4…..N bits)	6.5	2400
7	32 bit Barrel Shifter(2, 3, 4 …..N bits)	1.5	1500
8	Flop variatnts		6/8/10
9	FIFO variants
10	SP/DP RAM variants
11	RA variants

Table 1 Example Components synthesized to target library

The synthesis experimental values are to be tabulated as shown in example Table 1. Also these values can be approximated to general equations for ease of use by module level designers. For example “N” bit fixed comparator propagation delay is 0.2*(N/8) ns and gate count is 5*(N/8).

These propagation delay values can be used to arrive at appropriate pipelining stages based on various components delay in series. The number of NAND2 gates can be used for overall gate count estimates. Also as a rule of thumb certain percentage of logic area shall be taken as route area [for e.g., 30% or so] for a given target technology library for die-size and floorplan budget.

The overall gate count estimate process can be split into following categories:

Data path logic
Control path logic
Data path pipeline stages
Control path pipeline stages
State machines
Memory such as RAM/ROM/FIFO

Each IP module designer has to further partition the respective modules into small, manageable blocks and sub-blocks and look at various gate count estimation categories listed above and described in following sub-sections.

3.1 Data path & Control path logic

With the perquisites information and Table 1 data, module level designers will be able to estimate base gate count with following equation. This base gate count will include data and control path logic components contribution.

Base gate count = Ó (A*B) for i = 0 to C

Where,

A = Number of instances of a particular component type

B = Number of NAND2 gates of a particular component type

C = Number of different component types

Identifying data path logic is relatively easier from architecture block diagrams or pseudo code or C model. Consider an example of simple 8-bit alpha blending function. This alpha blending can be achieved by (Vá + G (1-á)), where “V” is video plane and “G” is graphics plane. In this example, the logic gate count will be sum of gate counts of 8-bit multiplier, 16-bit adder and 8-bit subtractors by re-writing it as ((V-G)* á + G). In this example A = 1, C = 3 and B can be taken from Table 1 for respective bit widths. The approximate gate count of this example alpha blending data path logic is 1400 NAND2 gates. Also known standard block gate counts such as FIR/IIR filters if available can be directly used.

On the contrary, control path logic estimation is difficult task. One approach for this is to derive higher level functions of control logic. Consider an example of implementing control logic for a 256 deep FIFO. In this example, we can split the task of control logic into write side and read side logic. On write and read side logic, there will be one address counter and comparator each apart from synchronization flops for flags generation. This will give us an overall idea of logic in control path. The approximate gate count of this FIFO control logic will be 400 NAND2 gates.

3.2 Data path pipeline stages

For data path logic, appropriate pipeline stages needs to be added based on the propagation delay of various components from Table 1. The data path pipeline stages gate count can be obtained with following equation.

Data path pipeline stages gate count = Ó (D*E) for i = 0 to F

Where,

D = Data path bit-width
E = Number of NAND2 gates per Flop
F = Number of pipeline stages

Consider the same alpha blending example referred in Section 3.1 for determining the data path pipeline stages. The alpha blending equation ((V-G)* á + G) can be implemented with 2-stage pipeline as shown in Figure 1, if the delay of subtractor/multiplier components given in Table 1 is close to one full cycle of final frequency target. In this example D = 8/16, F = 3/1 and E can be taken from Table 1. The approximate gate count of this example alpha blending data path pipeline stages is 400 NAND2 gates, if E is considered as 10.

Figure 1 Alpha blending data path pipeline stages

3.3 Control path pipeline stages

It will be difficult to estimate pipelining required for control logic. Following procedure can be used in this case:

Consider control logic combinational NAND2 gate count is “N”; the target frequency and chosen technology library allows “P” NAND2 logic levels in each pipeline stage; number of NAND2 gates per flop is “E”; Then approximate gate count of control logic pipeline stages can be calculated with following equation.

Control path pipeline gate count = ((N/P) +1)*E.

Consider the same FIFO control example referred in Section 3.1 for determining the control path pipeline stages. The approximate gate count of this FIFO control pipeline stages is 110 NAND2 gates, if we consider P and E as 40 and 10 respectively.

3.4 State machines

It is also another challenge to estimate gate count requirement of state machines. Following procedure can be used in this case:

Consider there are N states in a one-hot state machine; C control signals used in state machine; Number of NAND2 gates per flop is “E”; Then approximate gate count of each state machine can be calculated with following equation.

State machine gate count = (2C+E-1)*N.

In this case, output control signals from state machine are assumed to be 2 or 3. Also it is assumed that state machine is control only state machine and doesn’t include any arithmetic operations.

Consider an example of Image grabber simplified main control state machine as shown in Figure 2. This state machine controls capturing, storing and displaying of image data. This control state machine tracks: camera initialization (Cam_init), sensor image data capture (Cap_data), memory storage (Wait_img), display (Disp_data) and loop back to image data capture.

Figure 2: Image Grabber control state machine

As shown in Figure 2, there are six states in this control state machine with three control signal inputs (Init_done, Frame_done and Disp_done which indicates camera initialization done, complete frame capture indication and image display completion indications respectively). In this example, N = 6, C = 3 and E can be taken from Table 1. The approximate gate count of this example state machine will be 90 NAND2 gates, if E is considered as 10.

3.4 Memories

Determining the overall memory requirement such as RAM, ROM and FIFO is straight forward, with clearly defined micro-architecture. However additional pipeline stages shall be considered for the RAM/ROM macros in the gate count. The memory compiler used for generating the required memories will give die size numbers.

3.5 Overall gate count

With the knowledge of all the categories data from earlier sections, the gate count can be estimated. However if there are high fan-out nets in the design, designer has to think about additional logic for replication or timing budget in overall frequency computation.

After completing the estimation process, the gate counts can be compared against earlier known similar complex IP for validation purposes. For example a particular class of video compression engines (such as H.264 and WMV9) falls in similar complexity range. If we have gate count numbers of the one of the decoder and have complete list of difference between the decoders, incremental gate count differences for the new decoder can be extrapolated. This number can be used to validate the elaborate estimated gate count described in this paper.

Following factors causes inaccuracies in this estimation methodology:

Control logic
Synthesis tool related optimizations such as logic sharing and logic replication to meet desired frequency

Inspite of the above inaccuracies, the presented GEM has yielded near accurate estimates within ±10-15% of actual gate count in most of the cases.

4. Limitations

The presented GEM works well with processing intensive designs such as image/data processing and video codec etc. However for control intensive designs where logic is mostly random, the estimation is time consuming and may not be accurate.

5. Conclusions

The presented GEM has been successfully deployed for gate count estimation of various IPs. In most of the designs, the inaccuracy in estimation is less than 10-15% for compute intensive designs such as Video Decoder.

6. References

[1] Sebastien Bilavarn, Guy Gogniat, Jean-Luc Philippe and Lilian Bossuet, “Low Complexity Design Space Exploration from Early Specifications”, IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, 2004

[2] Sébastien Bilavarn, Guy Gogniat and Jean Luc Philippe, “Area Time Power Estimation for FPGA Based Designs at a Behavioral Level”, LESTER - Centre de Recherche 56100 Lorient France