Large Memory Modeling

by Andjelija Savic, HDL Design House
Belgrade, Serbia and Montenegro

Abstract :

Physical machines may question swap size boundaries trying to load large HDL designs. At the time simulation objects are built, memory requirements are formed depending on the space needed to run the application itself and the space needed to model private data. Memory declarations residing at HDL model code may impose unacceptable needs for statically allocated space. High memory requirements may lead to system hangs, as intensive swapping must take place to carry out simulation. Large design elaboration may result in design load failures, simulation crashes or performance degradation.

Large Memory Modeling paper discusses an abstract memory management model that should provide guaranteed system performance. The embedded algorithms logically work around memory allocation issues, relying on solutions based on theoretical analysis of data structure alternatives, search engines performance, virtual memory mechanism and cashing. Straightforward simulation techniques are bypassed, delivering a solution insensitive to the size of the HDL design memory address space. The methods established must address the needs of common-used physical machines, imposing average memory and processing requirements. Good solution candidates must provide high time-consuming versus memory-consuming parameters balance.

LARGE MEMORY MODELING

The algorithms established may be referred to as :

Heap Allocation Algorithm
Static Virtual Memory Algorithm
Dynamic Virtual Memory Algorithm

Algorithm implementation details are illustrated for the following options :

C foreign interface option
HDL standalone option

Heap Allocation Algorithm addresses simulation performance issues migrating from static towards dynamic memory allocation. Memory declaration no longer resides at memory model HDL code. At the time simulator objects are built, there will be no memory requirements referring to memory data storage. Heap Allocation Algorithm is based on allocate-when-needed approach. Memory locations set to initial data value do not require storage elements allocation. Upon updating memory address content, new storage element will be allocated at heap, preserving written memory data. At start-up, not all memory will be consumed. Memory requirements are initially low, but are allowed to increase dynamically at run-time.

HDL memory model cannot perform direct memory access due to direct memory addressing has been sacrificed abandoning static memory allocation approach. RW memory accesses are redirected and go through memory management block. HDL model RW interface supports channels for heap allocation/deallocation procedures. HDL memory model is functionally bounded to generation of RW requests. Algorithm flow is fully embedded into the memory management block. Memory management logic, being aware of data structure allocated at heap, is held responsible for search engine implementation, in control of RW flow.

Static Virtual Memory Algorithm addresses simulation performance issues by not eliminating, but softening the requirements for statically allocated memory. Memory is still declared at memory model HDL code, not in whole, but partially. Size of memory space declared, statically allocated, must be acceptable form simulator performance standpoint.

Statically allocated memory slice is referred to as HDL model virtual memory. At the time simulator objects are built, there will be memory requirements referring to memory data storage, but acceptable, matching virtual memory size. Memory address space is modeled via segment files structure. Segment files represent physical memory equivalent. Static Virtual Memory Algorithm is based on the assumption that memory accesses close in time address near-by locations. Upon random address A has been accessed, it is highly probable for addresses residing in the same memory block to be accessed soon. High memory requirements no longer refer to heap allocated space only. Static Virtual Memory Algorithm will balance memory requirements between heap and physical memory.

HDL memory model is able to perform direct memory access as static memory allocation is performed for a memory slice. Direct accesses are limited to physical memory addresses residing at model virtual memory at the specific moment. For memory addresses residing at segment files, RW accesses are redirected and go through memory management block. If target address is not found at HDL model virtual memory, model may generate RW requests. HDL model RW interface supports channels for virtual memory block request/cancel operation. Physical memory interface, or segment file structure interface, supports file I/O operation to handle virtual memory block incoming/outgoing transfers.

Algorithm flow is partially embedded into memory management block, and partially into HDL memory model. Memory management logic is held responsible to perform :

virtual to physical memory data transfers
physical to virtual memory data transfers

HDL memory model is held responsible for :

status flag maintenance
priority flag maintenance
voting logic implementation
initiating memory block incoming transfer
initiating memory block outgoing transfer

Status flags indicate if physical memory segments reside at virtual memory at the moment segment has been addressed. Upon RW access, virtual memory access flags are updated. Maintaining access flags assures re-generation of virtual memory blocks access sequence in time. Blocks latest accessed are given higher priorities. Upon recognizing the need to free virtual memory address space, voting logic is triggered. The lowest priority block must be chosen for the outgoing data transfer. Priority is determined based on maintained priority (access) flags. Upon RW attempt above memory address not fetched to virtual memory address space, incoming data transfer request will be generated. Physical memory content will be copied to chosen virtual memory block, providing future direct memory accesses referring to same address region (segment). Upon attempt to transfer physical memory data into a full virtual memory region, outgoing data transfer request may be generated. Virtual memory content may be moved back to corresponding physical memory segment file, enabling scheduled incoming data transfer.

Dynamic Virtual Memory Algorithm initially behaves similar to Heap Allocation Algorithm. As long as environment conditions allow so, Dynamic Virtual Memory Algorithm will manipulate the allocated heap space as specified by the items discussed at Heap Allocation Algorithm section, with a single major difference. The algorithms embedded at Heap Allocation Algorithm memory management block, perform iteration operations assuming memory storage elements are organized to form a single heap data structure. Dynamic Virtual Memory Algorithm implementation breaks the single heap data structure into the multiple data structures allocated at heap. An atomic data structure allocated at heap is logically equivalent to virtual memory block. Formations created at heap represent a flexible virtual memory representation. Virtual memory blocks are not fixed-sized, but dynamically increase demands for allocated space at run-time. Virtual memory block size depends on the number of modified (written) memory locations at a specific physical memory segment, since only modified locations require storage. Physical memory segments intensively used, when transferred to virtual memory, shall grow enjoying the space not needed by memory segments less intensively used.

Organizing virtual memory blocks at heap improves memory requirements parameters, but will not prevent potential allocation failures due to a lack of heap space. The need to address unacceptably large memory region, high number of storage elements may be allocated at heap and lead to a lack of storage space. Upon recognizing the impossibility to allocate additional heap storage elements, virtual memory algorithms are triggered. Memory management block is forced in such cases to interface physical memory logic equivalent, segment file structure. Physical memory segment files will be created and maintained following heap allocation failures, only. Physical memory segment file size will match corresponding virtual memory block size allowing physical memory requirements not to be static (high), but run-time dependent. For average or low memory requirements, depending on memory region used by application at run-time, the file structure may not be needed. Application will end based on heap allocation approach, without triggering virtual memory mechanism. At run-time, virtual memory algorithms are kept aside as plan B, and are called in case of necessity, upon recognizing the heap as full.

Start-up memory requirements refer to statically allocated memory space meeting design HDL code declarations. Heap Allocation Algorithm and Dynamic Virtual Memory Algorithm have low, or none, start-up memory requirements. Memory declarations are removed form HDL code, thus disabling demanding static allocation. Start-up memory requirements of the Static Virtual Memory Algorithm are not initially zero-sized, since virtual memory region is declared at HDL code. Start-up memory requirements are made acceptable for simulation performance, and can be further adjusted changing the amount of memory space reserved for virtual memory. Instead of consuming the memory needed to store HDL memory model data, requirements are brought down to handle virtual memory address space.

Run-time memory requirements refer to dynamically allocated memory space, meeting the unpredictable application memory requirements. Heap Allocation Algorithm and Dynamic Virtual Memory Algorithm may have, depending on application, critical run-time memory requirements. Data structures allocated at heap to hold modeled memory data, dynamically increase in size and may hit heap memory space boundaries. Further allocation attempts will result in failure. Dynamic Virtual Memory Algorithm overcomes demanding run-time heap requirements by shifting towards physical memory storage. Static Virtual Memory Algorithm implementation preserves a constant memory requirement at run-time. Memory used is statically allocated without implementing dynamic allocation approach.

Physical memory requirements are addressed by virtual memory algorithms, Static Virtual Memory Algorithm and Dynamic Virtual Memory Algorithm. Heap Allocation Algorithm will not require physical memory storage. Static Virtual Memory Algorithm models segmented physical memory through a segment file structure. Dynamic Virtual Memory Algorithm references physical memory storage upon exceeding heap memory boundaries. Physical memory requirements may be critical, depending on modeled memory size and number of the memory locations used (only updated memory locations are stored).

Processing requirements refer to computations performed and the time needed to fetch addresses memory element upon RW request has been recognized. Static allocation avoided, imposes the following trade-off; direct memory accesses are no longer possible. Each RW access addresses linked storage element structure and triggers implemented iteration routine. Access time processing requirements depend on iteration algorithms efficiency. Heap Allocation Algorithm and Dynamic Virtual Memory Algorithm manipulate data stored at heap, eliminating the possibility of direct accesses. Linked data structure may be in a form of binary tree, heightening iteration performance. Linked data structure may be in a form of a linked list, preserving memory space required to model the structure. Linked list approach may degrade performance and should be considered versus an end application. Access time requirements addressed by Heap Allocation Algorithm and Dynamic Virtual Memory Algorithm depend on data structure characteristics at the time it is accessed, element count and balance. Implementations relying on binary tree element organization may consider binary tree re-balance algorithms during application time, to preserve access performance. Static Virtual Memory Algorithm performs RW accesses above data residing at virtual memory. Virtual memory statically allocated enables direct memory access eliminating access time issues.

File I/O interface processing requirements refer to the time needed to perform data transfers between modeled data structure and physical memory storage. Heap Allocation Algorithm does not reference segment files. Static / Dynamic Virtual Memory Algorithm may intensively interface file structures since segment files are used to represent physical memory equivalent. System hangs may occur depending on : transfer algorithm efficiency, chosen segmentation scheme and memory access sequence (virtual memory assumptions conflict). Application may fully respect the assumptions embedded into virtual memory algorithms implementation, thus minimizing file I/O operations and eliminating performance hangs. Dynamic Virtual Memory Algorithm softens file I/O interface issues, due to greater implementation flexibility. It may converge towards Static Virtual Memory Algorithm file I/O requirements in border cases, only.

Routines implemented to embed the algorithms may come as a foreign interface implementation (language dependent). Foreign interface routines require import, being translated and exported into libraries that must be specified for elaboration. Objects are linked into simulation, statically or dynamically, affecting performance.

Implementing Dynamic Virtual Memory Algorithm the logic has been embedded writing simulator foreign interface routines. Implementation is carried out through C language constructs. Foreign interface code has been built conforming to the core/wrapper architectural approach. Main algorithm functional blocks are isolated at core block, written in pure C. C interface core has been developed to establish optimal coding structure and address reuse items. Core code is an independent C implementation of the chosen algorithm alternative, being unaware of its end usage. The core is built to be insensitive and fully independent of hardware description language option or simulator alternatives. The core may be used as a stand-alone C module for any application in need of implemented routines and possible to interface C language code. C interface wrappers are developed to address target simulator specific requirements. Compiled C routine code must be linked to simulator executables in order to contribute to model simulations. Simulator linking procedures are developed separately to satisfy Cadence or QuestaSim environment, and are placed at C wrapper code block. Wrapper code is held responsible to register implemented foreign routines. Wrapper code must be adjusted to reference appropriate simulator software provided libraries. C interface wrappers are developed to manage memory model hardware description language options.

An atomic storage element C model :

typedef struct mem_data {
    long int address;
    int data;
    struct mem_data* left;
    struct mem_data* right;
} mem_data_t;

An illustration, referring to Verilog HDL model foreign interface under QuestaSim simulation environment, is given as follows. C wrapper code holds veriusertfs array initialization, modeling interface towards the memory management block. The tasks listed are implemented at core block, but are registered at wrapper block according to environment specific restrictions.

s_tfcell veriusertfs[] =
{
    { usertask      , 0, 0, 0,
       initialize_calltf , 0, "$initialize"               },
     {usertask      , 0, 0, 0,
       preload_mem_calltf , 0, "$preload_mem" },
     {usertask      , 0, 0, 0,
       write_mem_calltf   , 0, "$write_mem"      },
     {userfunction, 0, 0, 0,
       read_mem_calltf    , 0, "$read_mem"        },
     {usertask    , 0, 0, 0,
       erase_mem_calltf   , 0, "$erase_mem"      },
     {usertask    , 0, 0, 0,
       corrupt_mem_calltf , 0, "$corrupt_mem" },
     {0}
};

Implementing Heap Allocation Algorithm, the logic is embedded relying on memory model HDL constructs. Foreign interface approach is abandoned to reduce implementation constraints and eliminate operating system, simulator and foreign code compiler dependency. The solution is less robust and highly sensitive versus the percentage of the memory address space allocated dynamically by the target application. Memory management features are black-boxed from the end-user standpoint. For VHDL source files, requiring unacceptable memory resources, memory management features may be localized into the module library. The library reference will be placed by the author at VHDL source files eliminating further end-user actions. For Verilog source files, requiring unacceptable memory resources, memory management features are embedded into a separate SystemVerilog module. Implementation is carried out relying on OO SystemVerilog extensions for dynamic allocation constructs to be recognized. SystemVerilog module, held responsible for memory management, will be merged with Verilog memory model by the author eliminating further end-user actions.

The core blocks encapsulate low-level memory management procedures, performing memory allocation/deallocation and manipulate atomic memory storage elements. The blocks embed the Heap Allocation Algorithm flow independently of the HDL memory model functionality. Interface blocks encapsulate higher functional level memory operations, addressing the commands found at memory model functional specifications. The procedures placed at interface blocks will break a memory device command into a set of routine calls embedded at core block. Only routines placed at interface blocks will be visible by the HDL memory model. Low-level routines may be called from interface blocks, only.

SystemVerilog implementation details are elaborated as follows, for illustration purposes. An atomic storage element SystemVerilog model :

class linked_list_c;
        reg[27:0] key_address;
        integer val_data;
        linked_list_c successor;

        function new;
endclass

Memory models, migrating towards dynamic allocation solutions, should be memory address space critic. If all memory locations are maintained by a single linked_list_c object, single linked list, performance parameters may be led to unacceptable values. Allocating each memory element into a single linked list will degrade search element operation performance as being directly dependent on linked list element count. Performance degradation is addressed by memory partitioning. Memory region is modeled by an array of linked lists as follows. The original author is held responsible to adjust memory partition size, linked list maximum element count, versus optimal performance.

linked_list_c linked_list [list_num];

low_level_interface_c and rw_interface_c OO data types are additionally defined, to embed the memory features routine bodies. Class declarations listed below implement the algorithm flow, and stand behind non-data-modeling purposes.

class low_level_interface_c;

        function new;

        task position_list;
        task insert_list;
        task remove_list;
        task remove_list_range;
        task create_list_range;
        task insert_list_range;

endclass

class rw_interface_c;

    low_level_interface_c low_level_interface;

    function new;

    task read_mem;
    task write_mem;
    task corrupt_mem;
    task erase_mem;

endclass

VHDL deallocation is performed explicitly by the author. SystemVerilog deallocation is performed by a garbage collection algorithm. The author is held responsible to dereference an object enabling it to be recognized by the garbage collector. Dedicated tests have been written to check if accident memory leaks are present.

Static memory allocation approach maintaines memory address values implicitly. Memory space allocated per memory location requires the space to maintain memory data value only. No data structure maintenance information will be required. Dynamic memory allocation approach imposes higher memory requirements per memory location. Memory space will be allocated to maintain memory data value, linking structure and explicit address (data structure key) value. With memory requirements growing dynamically at run-time, dynamic allocation memory models will, at some memory usage percentage, reach memory area occupied by the static allocation memory models initially.

Dedicated routines have been written to carry out memory block or sector operation, as memory operations performed above the whole memory blocks have been recognized as performance critic. Straightforward, operation is initiated once per memory location independently. For the operation to be executed above the memory location, data structure element handle must be set to point to the target location. Handle will be updated performing a search element operation that iterates through implemented data structure. Iterating through data structure for element match has been evaluated as time consuming operation. Approaching the memory block operation straightforward, iteration will be initiated for each data structure element. Depending on the element count, execution time may be unacceptably increased causing simulation hangs, that is, degrading performance. Executing search operation for each data structure element is avoided. Instead of calling the memory command, command to be performed above the whole block, for each location in a block, a new separated routine has been provided that will recognize the performance degradation tendency. The new routine is referred to as a performance dedicated routine and will act as follows. Search operation is initiated only once, for the first memory block location. Desired command is executed above the first memory block location. Flow control will not be passed on, but will remain at performance dedicated routine further on. Increment operation is then performed reaching the second memory block location. Keeping the flow control provides the possibility to avoid successive initiation of the search operation, but allows all memory block location to be accessed in a row, by increment operations. N search operations are replaced by one search operation followed by (N-1) increment operations. Please do note that performance dedicated routines are model functionality specific.

Dedicated routines may be written if it is known in advance intensive successive memory accessed will be performed. I/O routine specification may be extended for additional I/O argument holding the handle set to the data structure element last accessed. The following memory commands issued to neighbor memory locations may use previously updated handle thus eliminating the data structure search initiation.

Heap Allocation Algorithm may be stress tested by performing intensive memory write operations. Heap allocated space is increased by each write memory access. Performing intensive RW accesses above large heap allocated space, but prior to allocation failure, will result in performance degradation due to iteration algorithms are less able to manipulate large structures and are less efficient. Further triggering of numerous successive memory writes will reach heap memory boundaries resulting in allocation failures (actually bounded by swap size). Additionally, stress writes may be performed accessing memory in a sequential order which will eliminate major advantages of a binary tree data formation. Inserting binary tree elements in a sequential order will degrade binary tree structure into a linked list.

Static Virtual Memory Algorithm may be stress tested by performing intensive memory accesses out of order, addressing different memory segments in a row. The following write access sequence should fully degrade algorithm performance : SA(0), SA(1), SA(2), SA(3) .. SA(N-1), SA(0), SA(1) .. SA(N-1) SA(0) ... SA(I) refers to a single memory address at memory segment (I). Such access sequence breaks assumption virtual memory mechanism is based on - memory accesses close in time address near-by locations. Provided test sequence will cause intensive virtual memory incoming and outgoing transfers relying on file I/O operations efficiency. It is important for the memory accesses to be write operations to trigger a worst-case situations, where a write-back operation is required.

Dynamic Virtual Memory Algorithm may be stress tested by the following test sequence. Initiate stress write operations hitting heap memory boundaries causing allocation failures. Virtual memory mechanism will be triggered interfacing segment file structure. Then follow by out of order accesses : SA(0), SA(1), SA(2), SA(3) .. SA(N-1), SA(0), SA(1) .. SA(N-1) SA(0) .. SA(I) breaking virtual memory concept.

Unacceptable memory requirements, degrading or breaking the simulation, may be easily avoided eliminating or minimizing statically allocated space required for modeling memory data. Migrating towards memory storage alternatives, dynamic allocation or cashing, arises performance issues caused by increased processing requirements. The logic embedded into the algorithms proposed, must resolve memory allocation problems without sacrificing processing performance. Target application or test environment characteristics should be analyzed in order to improve routines responsible for performance level. The routines may be adjusted accordingly. Upon critical algorithm points detection, versus target application parameters, dedicated routines may be implemented focusing on performance increase only. As a simple example, if it is known that a target application will sequentially access memory locations, binary tree deformities may be prevented by implementing routines monitoring binary tree height and balance. When deciding on algorithm alternative the parameters that should be questioned refer to percentage of memory space addressed by application and common memory access sequence; random or distant against ordered or near-by.

Industry Articles

Large Memory Modeling