1 Introduction As designs increase in complexity, the density of memories that they connect to has also increased. It is not uncommon to see gigabyte memories. Having large memories comes with its own set of challenges during the verification stage. This paper talks about the strategies designers can use to surmount such issues, and discusses the suitability of approaches to each type of design. 2 Why HDL memories can be bad In test benches, memories traditionally have been modeled using data objects that can remember values. These are register types in Verilog and signals/variables types in VHDL. HDL memories are simple to model and maintain. But as memories grow in size, as they will with more complex designs, there comes a point where HDL memory’s dark side really gets exposed. Situations can get so ugly that test benches modeled using such memories may refuse to cross your favorite simulator’s loading phase, much less simulate. There are two primary reasons for this. Contemporary designs interface with very large memories. Secondly, HDL memory will actually consume more memory than what is really used for modeling. It is not unusual for HDL memories to actually consume up to 8 bytes to store just 1 byte. 3 How much is enough (The dreaded 90/10 rule) Normally when one is modeling memories in HDL, one simply keeps an array, which is as big as the size of the actual memory. The question that comes to mind is whether one is going to use the full memory during simulations. Let's take an example. Say we have a design that has a 512 MByte memory to store the incoming packets. If we build a test bench to verify such a design, we would notice that in 90 percent of our test cases, we would probably end up using not more than 10 percent of actual memory. This reminds us of classical 90/10 rule of economics 90 percent of the tests use only 10 percent of the allocated memory. The following graph shows one typical memory usage. Figure 1 Memory utilization graph In figure 1, the top red line shows the memory that is allocated at the start of simulation. The bottom red line shows how much memory is really used during typical simulations. The graph shows very unusual gap between supply and demand of Verilog, or static, memory. Are we over allocating the memory? We could try and take care of over allocation of memory by reducing the size of Verilog memory array. But one of the major challenges here is how much to reduce it. Should it be 10 percent of actual memory, 20 percent or 30 percent? This is something we can never answer. There is always a chance that one of our long tests may cross this memory limit and thus render simulation a wasted effort. Is there a way to avoid such wastage of memory? Yes! Allocate memory on demand, which is discussed in next section. 4 Dynamic memories Dynamic memories are those whose memory is allocated on demand. This is possible in Verilog using PLI and dynamic memory allocation functions (like malloc() in the C language). Dynamic memories work by passing the address and data information from Verilog to C using PLI calls. There are many ways that this information can be organized in C. One of the simplest ways is to "malloc" as much memory as the size of data, address and next pointer. This is shown by the code snippet below. The memory element in C consists of the instance of the following struct: The method to read is just the opposite, where only the address is transferred using the PLI call and the called function return data. Figure 2 Typical layers that exist in dynamic memory Figure 2 shows a dynamic memory model that encompasses 3 different layers. Normal bus functionality of the memory is captured in Verilog. The actual memory is implemented in C. The PLI layer is used to send and retrieve data from the C domain into the Verilog domain. It is not required that the bus function of memory be implemented in Verilog only; it can also be done in C. In such cases, you will hardly see only logic in Verilog. But my recommendation is “use Verilog when you can, use C when you have to.” While the above approach works fine, there is still a lot of room for improvement. For example, the approach above utilizes 3 locations to store single data. This can be improved by having a single address for multiple data whose addresses are linear. Similarly, the arrangement of data plays a big role in the delay that is incurred when a read operation happens. One can use various search algorithms to find the data that is being addressed. 4.1 Why C memories can be bad While the above approach gives us the flexibility of allocating and using only as much memory as is really required, it does come at a cost. Since we use PLI calls to transfer and retrieve data from the Verilog domain, there is overhead involved. One of the primary motivations for going to dynamic memories is to avoid over allocation of memory. By using Verilog memory, we underutilize memory for 90 percent of the tests. On the other hand, by using dynamic memory, we pay for the PLI overhead for every memory access. It is pretty clear that Verilog memories are fast, while dynamic memories save lot of wasted memory. Given this fact, is there a middle path that memory modelers can take and get the best of both features. The following section deals with such an approach, which I call as mixed static and dynamic Memory. 4.2 Mixed static and dynamic memory approach In this approach one has both Verilog (also called static) and dynamic memory (also called memory in C) for a single memory model. In this model, Verilog memory is pre-allocated and dynamic memory will grow in size depending on demand. With such an approach the advantage is that for 90 percent of your tests, you will end up using Verilog memory and will not make a single PLI call. Only for the remaining 10 percent of tests will you use a PLI call. Thus you only pay in terms of extra computations for 10 percent of test cases. For the remaining 90 percent of test cases, no PLI overhead is involved. The amount of memory allocated for Verilog could be between 5-20 percent of the total memory. There are many ways in which the Verilog memory can be arranged. The following sections discuss different approaches in more detail. 4.2.1 Verilog memory lower, dynamic memory higher In this memory, the initial portion of memory is stored in Verilog and anything above that goes into dynamic memory. As indicated in the previous section, static memory can be around 5-20 percent of total memory. If the total memory size is small (say 8 MBytes) then the ratio Verilog memory to total memory can be high. On the other hand, if the total memory is large, then ratio of Verilog to total memory can be small. The following table shows how Verilog memory can be distributed for different total memory sizes. Figure 3 Memory distribution in “Verilog memory lower and dynamic memory higher” approach Pseudo code for the above approach is as follows: The scheme for read is almost same as for write. There is one additional optimization that you can have in the above scheme. Since we know that data is pushed to the C domain only if the address is more than a certain range, you don't need to transfer the full address bits into C. This can result in saving dynamic memory. The above scheme makes one major assumption that a design will always use the initial memory range. With this assumption, we have modeled such that the initial range is in Verilog and rest is dynamically allocated. Though this may be true sometimes, it may not be the case always. For example, a design may start using the end address range first and then it may move downwards thereafter. Apart from this, there can be instances where designs may make access to certain address range more frequently than other address ranges. For example, a design may access its configuration data more often during processing. It thus makes sense if we can choose where the Verilog memory really sits in the full memory space. The next section defines this scheme. 4.2.2 Range locked Verilog memory In this case, Verilog memory does not occupy the initial memory space. The exact range where Verilog memory lies is user defined. This way one can fine-tune the position of Verilog memory to get the best performance. For example, if a DUV accesses configuration data more often, then it can be locked in Verilog memory. Figure 4 Verilog memory placement in range locked approach Figure 4 depicts how static and dynamic memory can be distributed. Verilog memory’s region, as clearly indicated in the figure, is not fixed, but can be placed anywhere in the full memory region. Pseudo code, which is pretty much same as the previous one, is given below. This type of memory does provide the flexibility of placing Verilog memory anywhere in the full memory range. But we are assuming that most of the accesses will be localized within the Verilog memory range, and only sometimes we may cross this range. It is indeed true that total memory usage during normal simulations is most likely to be within the total Verilog memory size, but what is the guarantee that accesses will be localized within the Verilog range? Suppose a design might store Ethernet frames in external memory. The location where it stores the frame might be a function of destination address. In this case, frames arriving into the design with fairly random destination addresses will spread out (though sparsely) throughout the full memory range. The total memory usage is most likely to be less than total Verilog memory, but still dynamic memory will be used since all accesses might not be in Verilog address range. What, then, is the solution? Well, the trick is to not force Verilog memory to have a fixed address range in total memory. The next section deals with such an approach. 4.2.3 Associative static and dynamic memory This is probably the most versatile type of memory. Verilog memory range is never defined in this class of memories; only the Verilog memory size is defined. Verilog memory can lie anywhere in the total memory range. The following diagram depicts this approach. Figure 5 Distribution of Verilog memory in the total memory space As shown in Figure 5, Verilog memory can be spread out in multiple chunks in the total memory range. What is fixed is the total Verilog memory size. The rest is decided dynamically during simulation. This flexibility is brought about by implementing Verilog memory in the form of associative memory. The methodology, in theory at least, is similar to direct mapped cache memories. Yet there are some differences. First, there are really no cache lines, or the cache line is just one location. In cache memories, the tag (which is a part of total address) is maintained for every cache line. The cache line can be 32, 64 or 128 bytes depending the how much spatial locality is anticipated. But in our memory we maintain tag for every location. Secondly, our memory does not follow a line replacement policy. While regular cache will replace old contents with new data, our memory does not replace existing contents. If the Verilog memory is full then new data has to go dynamic memory. The main reason for choosing this approach is to keep things as simple as possible. Nobody wants to spend time debugging a test bench component. Figure 6 Verilog associative memory schemes Figure 6 illustrates how Verilog memory is maintained. Every Verilog location is divided into three parts: tag, valid, and actual data. The address is basically divided into two parts: tag and index part. How much of the address the tag and index part constitute depend on how much of Verilog memory is present. When a write access is made, the index part of address is used to index into Verilog memory. If valid bit is not set, then that location is free and is claimed by setting the valid bit, storing tag and data in the respective area. When a read access is made, again indexing is done into memory. From this location, if a valid bit is set, then the tag is retrieved and compared with tag part of the current address. If they match, then there is a hit in Verilog memory. Data is picked up from corresponding locations and returned as read data. If the tags do not match, then access is made to dynamic memory using PLI. The prototype for this memory is given below. 5 Conclusion In the era of highly complex designs, if the verification effort is to be successful then one has to abandon the traditional “one size fits all” attitude and explore new avenues for verification. Sharan Basappa is technical manager in the VLSI group of Indian software development firm HCL Technologies. He has over 8 years experience in ASIC verification. |