Soft memories in PD flow : Myth and Reality
ASIC Engineer, eInfochips
Abstract:
Soft memories are nothing but memory made up of flip flops. These are very easy to code and very versatile to use from a design perspective. If a designer wants to make a 570bit memory instead of instantiating a 1k (assuming this is the smallest sram size available) sram and using only 512 bits of the sram he can just create a fifo using flip flops. This way he economizes on power while using existing libraries.
Even though soft memories or fifos make life easy for the rtl designer it may create issues in different stages of the PD flow under specific circumstances. We will discuss these issues in detail and we will see how this can be handled at the PD end.
I. Introduction
This paper focuses exclusively on soft memories and how they can sometimes cause problems in the PD flow. Not all soft memories are problematic. They become problematic under some specific circumstances.
For instance the soft memory created is too large which can create congestion in PD flow. The soft memories could be intricately clock gated which could cause problems down the line in clock and route.
Fig 1: Block level schematic of a fifo
The diagram above shows a typical fifo and the signals associated with it. As seen above it has quite a few signals associated with it and this hookup gets complicated further when we associate clock gating with these fifo’s to save power
We will be focusing on soft memories used in high speed designs clocked around 1 GHZ. These designs follow a clock mesh topology to minimize skew. This can also cause congestion in soft memories.
II congestion because of soft memorys
We will illustrate a few cases where soft memorys have caused congestion after clock step.
We have illustrated a case below in a 600k gate count design where just the size of soft memory has doubled from 8192 bits to 16384 bits. This has caused congestion to really balloon at clock stage of PD flow.
Fig 2: 8192 bit soft mem
Fig3: 16384 bit soft mem
Fig :4a
Fig 4b
Congestion because of 8192 bit and 16384 bit soft memory in clock step.
These soft memorys were intricately clock gated. Since now the soft memory was sitting in a place where there was little room, when the time came for hooking up the clock gating for the soft memorys in the clock step we found that each clock gater was cloned many times over and there was needless buffering. All of this had to be kept in that small area and hence a lot of functional cells which should have been there were scattered. This in turn led to long paths and more buffering. This has caused a lot of congestion in clock step and subsequently huge degradation in timing. Thus just by doubling the size of soft memory we have seen that congestion was introduced in a design where there was none.
Congestion like one seen in Figure 4b is a huge bottleneck and we cannot proceed further without solving it. Solutions exist for these both at RTL side and PD side.
III Congestion Analysis
To get to root of the issue The PD engineer should first find out the size of soft memory (x rows * y columns).If the soft memory is clock gated he should then find out how the soft memory is hooked up. He should see if the clock gating is loose (the entire memory is being driven by one clock gater) or if it is tight (each individual row/column is clock gated).The PD designer then has to see where should endeavor to find out how the soft memory interacts with other modules in the block. If the PD owner finds that registers of soft memory go and sit within macro channels he should first put a soft blockage and try to get these registers out of narrow channels, he should also increase the number of routing layers which may solve the problem.
If congestion still persists then he has to study how each and every flop in the soft memory is placed. Let’s take the case of a small (5 row x 9 columns) soft memory where each row is clock gated. Ideally the registers should be placed as shown below in table 1
1_1 | 1_2 | 1_3 | 1_4 | 1_5 |
2_1 | 2_2 | 2_3 | 2_4 | 2_5 |
3_1 | 3_2 | 3_3 | 3_4 | 3_5 |
4_1 | 4_2 | 4_3 | 4_4 | 4_5 |
5_1 | 5_2 | 5_3 | 5_4 | 5_5 |
Table 1. Ideal location of flops (row columns)
1_1 | 2_2 | 4_3 | 5_4 | 1_5 |
2_1 | 1_2 | 4_4 | 2_4 | 5_5 |
5_3 | 3_2 | 3_3 | 3_4 | 3_5 |
4_1 | 4_2 | 1_3 | 2_3 | 4_5 |
5_1 | 5_2 | 3_1 | 1_4 | 2_5 |
Table 2. Location of flops at clock step
However the soft memory got placed as shown in fig 6.Simply taking the case of row 1 we see that all different registers that should be sitting in row one are now sitting in other rows. The clock gater which now drives this tow will have to be cloned as now the registers are sitting far apart. Once the clock gater is cloned we will see more intermediate buffering on paths associated within the clock gating logic. The PD engineer can use some clustering or relative placement script for placing flops in desired order
If congestion still persists and timing deteriorates then we have no option but to approach the problem from an RTL perspective. We can try to reduce the complexity of clock gating (e.g. one clock gate for entire memory) or we can replace the soft memory by sram. For the case above we have substituted soft memory with 2 srams and they have done the trick.
Fig5: Congestion after conversion of soft 16k memory to two srams
We would also like to showcase this problem in a design with a 1 million gate count.
Fig6. Placement of soft memorys
As seen in fig 6 the soft memorys are placed in a circular fashion where diameter of circle is close to 200 microns.
The size of a soft memory is 64*29 and it was intricately clock gated. This caused a lot of congestion at clock stage where flops are aligned
Fig 7: Congestion because of soft memory
The congestion caused by hookup of intricately clock gated soft memory is shown in fig 7.Here no amount of tiling, relative placement and other PD techniques produced the desirable results.
Then ultimately the complexity of clock gating was further reduced and also the size of soft memory was reduced by half. Now the diameter of these circles were much lower and consequently congestion also drastically reduced
IV Conclusion
Soft memorys in general work fine and are a real asset to the RTL designer. They can instantiate small arrays as and when needed without instantiating traditional srams. They help in reducing design area as well. Soft memories misbehave in the PD cycle under specific circumstances.
This happens in scenarios where the floor plans shape is highly rectilinear, soft memorys are very large or parts of soft memory go and sit in undesirable places. In all these cases thorough analysis is needed by the PD engineer as to why soft memory misbehaves. PD designer must first try all options available to him and then approach RTL designer for a fix. Many a time by removing soft memory registers from within macro channels or doing some relative placement can mitigate the issue.
If RTL fix is available easily that should also be tried. The PD engineer and RTL designer must work in tandem to solve this issue.
|