|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Design Considerations for High Bandwidth Memory ControllerAtul Dhamba, Anand V Kulkarni (Atria Logic Pvt Ltd) INTRODUCTION High Bandwidth Memory (HBM) is a high-performance 3D-stacked DRAM. It is a technology which stacks up DRAM chips (memory die) vertically on a high speed logic layer which are connected by vertical interconnect technology called TSV (through silicon via) which reduces the connectivity impedance and thereby total power consumption. The HBM DRAM uses wide-interface architecture to achieve high-speed, low-power operation. The HBM DRAM is optimized for high-bandwidth operation to a stack of multiple DRAM devices across a number of independent interfaces called channels. It is anticipated that each DRAM stack will support up to 8 channels. Each channel provides access to an independent set of DRAM banks. Requests from one channel may not access data attached to a different channel. Channels are independently clocked, and need not be synchronous. HBM DRAM v/s TRADITIONAL DRAM The 3 major areas where every memory, be it SRAM or DRAM, is constrained in terms of further improvement is area, power and bandwidth. With increasing operating frequency of processors the limited memory bandwidth forms a bottleneck to extract maximum performance out of a system. HBM not only offers a solution to this ‘memory bandwidth wall’ but with its close proximity interposer layer and 3D structure offers high yield and reduced form factor respectively. BANDWIDTH Compared to traditional DDR RAMs, HBM with its 128 bit wide data on each channel offers much higher bandwidth of about 256GB/s for 8 channels per die.
Table 1 Memory Bandwidth Comparison *Above table is used from Wikipedia. POWER As per the data analysis provided by SK Hynix the HBM DRAM memory saves up to 42% power than GDDR5 DRAM. SMALL FORM FACTOR To accommodate more memory on board it shall take more space for GDDR5 to place each die, however due to its 3D stack physical architecture and close proximity to the core processor the area footprint on board is saved by about ~90%. HBM MEMORY PROTOCOL OVERVIEW HBM Memory is controlled by commands that are issued on separate row and column address bus. The HBM memory channel is divided into multiple banks, each banks has multiples rows and each row has multiple columns. To access a memory location, the row should be opened first. The row address bus controls the opening and closing of address location while column address bus controls the write/read operation on the opened row. At a time only one row can be open in a bank so to access address location on different row in same bank, the current row should be closed by issuing precharge using row address bus. The updated HBM specification (JESD235A) supports pseudo-mode in which a new row can be activated in a bank having already open row. The HBM memory internally makes sure that it closes current row and opens new row. The data bus is also divided into 2 separate 64 bits data bus with shared row and column address bus. A pseudo-channel is created within the same channel thus, memory access to address location on channel 0 does not affect memory access to address location on channel 1 and this considerable amount of wait states or dead cycles are reduced further enhancing the memory bandwidth. The HBM memory can be put in low power modes by row address bus to save power on the I/O drivers. To further reduce power consumption, clocks can be gated when in power-down or self-refresh modes. ATRIA LOGIC HBM MEMORY CONTROLLER IP The Atria Logic HBM Memory Controller (AL-HBMMC) IP enables user to communicate with HBM memory at high speeds. Since, every channel is independent of each other, it makes sense to have an independent memory controller per channel and user has complete control over data transactions w.r.t a particular channel. The AL-HBMMC supports Legacy Mode and Pseudo-channel modes with varying densities from 1Gb to 8Gb. The standout feature is the ‘2-command compare and issue’ algorithm which increases bandwidth by considerably reducing dead cycles between row and column commands. The memory controller reference design is available in Verilog HDL. Table 2 lists the features supported by AL-HBMMC:
Table 2: Features List AL-HBMMC ARCHITECTURE The HBM memory controller speaks with single independent channel on HBM memory. The user-interface is FIFO-based with independent FIFOs for logical address, write data and read data. The memory controller does logical to physical address mapping, reads from write data FIFO when a write command is issued and stores read data into read data FIFO when a data is available from the memory for a read request. The timing parameters are by default as per Samsung specification, however for other vendors user can edit into the timing parameters through CSR registers. The PHY interface is a custom interface, which can be connected with a modified DFI interface. The memory controller is divided into 4 parts: command, write data, read data and register interface. The memory controller architecture mainly focuses on minimum communication between each of these sections which shall help user for area and performance optimization during ASIC layout. Figure 1 HBM MC Architecture AL-HBMMC FLOW DIAGRAM The HBM Memory Controller flow diagram in Figure 2 explains the behavior of memory controller for any mode and density. After asynchronous active low reset is de-asserted, the memory controller waits for calibration done signal from PHY, until then the user can configure the MRS and timing parameters into the CSR registers using register interface. User can configure the memory controller to initiate with initialization FSM, the controller checks for calibration done signal and then starts with its initialization FSM to configure memory with MRS settings updated by user. After initialization, the memory controller starts with normal write/read operation by reading from FIFO interface. During refresh and power-down modes the Memory controller stops reading from FIFO interface until it exits any of these modes.
Figure 2: AL-HBMMC Flow Diagram INTERFACE The HBM Memory Controller has FIFO-based interface on user side and custom PHY interface on PHY side. The write data, logical address and read data have separate FIFOs. A separate register interface configures the CSR and MRS registers of the memory controller. Figure 3 Memory Controller Interface The MC to PHY interface as shown in Figure 3 is a custom interface simplified to have separate ports for write path and read path. The write data, write eccc/dm, write dbi bus are delayed by the memory controller to match the write latency value configured by user. Memory controller sends command and write data are SDR (single data rate), and expects the read data as well at SDR. LOGICAL TO PHYSICAL ADDRESSING The logical address to physical address mapping follows a simple approach; it reads from logical address FIFO, converts to physical address and stores in an internal physical address FIFO. This physical address is read further and HBM commands like activate, precharge, write, read are generated and issued to memory. The logical to physical address mapping differs as per the channel mode and density. The physical addressing is based on BRC format (bank-row-column), to make sure that transactions on continuous logical addresses are stored in particular bank and not distributed all over in separate banks as in RBC format (row-bank-column). BRC arranges the banks in logical address order. It is true that RBC will be faster than BRC as it opens multiple banks. However, the ‘2-command compare and issue’ algorithm helps to make BRC access faster. COMMAND GENERATION The memory location in DRAM can be accessed in 3 steps: Open row -> access column to read/write -> close row. The row having the particular address location is opened using ACTIVATE command, memory takes fixed amount of time to open a row (tRCD), the row has 64 columns locations, each capable of storing 256bits of data. To access a particular location the column address and command to conduct a write or read is issued onto column address bus (RD/WR). Once access is completed the row can be closed by issuing a PRE command after a fixed time tRAS. If new row is to be opened in the same bank a fixed time of tRC has to elapse.
Table 3: Command Generation To do a simple transaction on a memory location multiple steps and timings need to be accounted for. However, user does not have to worry about these things as the AL-HBMMC is fully capable to handle these steps and timings easily. User only has to indicate an address location and command i.e. a write or read and the memory controller shall take care of the DRAM protocol to access the memory location. The AL-HBMMC is robust and intelligent to understand when to open a row, issue command, close row, issue refresh to the DRAM memory to 'charge' it up and prevent from losing the data it is holding. The intelligence of memory controller is reflected with its decision making to reduce dead cycles and increase efficiency. 2-COMMAND COMPARE AND ISSUE' ALGORITHM The efficiency of memory controller is increased by reducing dead cycles and issuing more commands than waiting for timeout w.r.t the previous row or column command issued. This is achieved through command re-ordering where the memory controller compares between current and next request and forwards the best possible command to the memory. For example, row command to open a new row address 0 in bank 1 is issued, rather than waiting for tRAS time (opening of row to issuing command time), if next command is issuing write/read to already open row address 0 in bank 2, the memory controller shall issue the next request thus reducing dead cycles. This is a simple example to demonstrate how this algorithm increases the efficiency of memory controller. Write Data Path User can write 288 bits on write data FIFO (32 bits for data masking + 256 bits for write data). The MSB 32 bits are for data masking if data masking is enabled in which each bit corresponds to masking 8 bits of write data. The data masking bits to write data bits relation is explained in Table 4. The LSB bits of write data ([127:0]) correspond to data sent to memory on rising edge of clock while MSB bits ([255:128]) correspond to data sent on falling edge. The memory controller generates the ECC bits (if ecc is enabled) and DBI bits (if write DBI is enabled). The legacy mode supports write data burst re-ordering for odd column address locations at burst length 4. The bursts {0,1,2,3} are re-ordered and sent as 2,3,0,1 to the memory. The write request with odd column address issues a flag which indicates that the current write data should be re-ordered and sent out. Write data latency is configurable in MRS mode registers from 1 to 8 tCK clock cycles. The write data is delayed and sent out in this block w.r.t the write data latency configured in MRS register 2 OP[2:0]. The ECC generation function can be enabled/disabled in MRS register 4 OP[0] by user. However, as per HBM specification, if ECC is enabled data masking is disabled and vice-versa is followed. If user enables ECC and data masking then ECC shall be disabled and data masking shall be enabled. The HBM specification does not define any ECC polynomial for ECC generation it is user specific, HBM memory does not perform ECC checks, just stores the ECC alongside data and sends out ECC during reads. ECC correction is done on read data path by memory controller. 8 bit ECC value for every 64 bits is generated on write data path. Likewise 32 bit ECC is generated for 256 bits of write data in SDR mode by memory controller. The write data bus inversion is a configurable function which can be enabled/disabled using MRS register 4 OP[1]. Every 8 bit of write data has single bit data inversion bit to indicate the byte is inverted or not. The parity is calculated for every 32 bits of data and sent out to memory. Even parity is calculated by XORing the every 32bits.
Table 4: Data Masking to Write Data Relation Read Data Path The read data from memory should be sent in SDR to memory controller by PHY. The PHY shall send back 256 bits of SDR read data, 2 bits read data valid, 8 bits read data parity, 32 bits of DBI, 32 bits of ECC bits. The read data path stores 258 bits (256 bit read data and 2 bit error flags) in read data FIFO. The 2 bit error flags are to indicate if the whole data is invalid during legacy mode or to indicate the read data for pseudo-channel 0 (error flag[0]) or pseudo-channel 1(error flag[1]) are invalid. Memory controller performs parity checking, data bus inversion, ECC checking and read data re-ordering (for legacy mode with burst length = 4) before sending read data to user. The read data is sent to user as explained in Table 5.
Table 5 Read Data To User REFRESH MODE Since a DRAM has a capacitive load at its output, the value held by the capacitor degrades over a period of time thus loosing the data stored. In order to avoid this scenario capacitors need to be 'recharged' at timely intervals, thus a DRAM is 'refreshed'. Refresh to all banks in a channel can be issued using REF command. The Refresh command is issued only after the request issued to memory is completed and all banks are closed. Back to back refresh is issued with minimum time period of tRFC maintained between Refresh commands. The specification provides two options, to either issue advance refresh commands (maximum of 9) then give next Refresh command after 9xtREFI interval or postpone refresh commands (maximum of 9) then issue 9 back-to-back refresh commands with tRFC time period between the refreshes. The memory controller follows the latter option. It postpones 8 Refresh commands then asserts a flag to indicate that memory should be put into refresh mode.
Figure 4 Advanced Refresh Commands* *NOTE: Above figure is used from JEDEC JESD235A standard for HBM. POWER-DOWN MODE The HBM memory runs at high frequency with a wide-IO interface. When normal transactions are not carried out, the memory can be put into power-down mode. The user can configure into the CSR register to enter or exit the memory in power-down mode. The CKE (clock enable) signal is de-asserted and asserted during entry and exit into this mode. The HBM memory drivers are turned off in this mode. Power can be saved further by gating the high frequency clock, even more the clock frequency can be changed in this mode. SELF-REFRESH MODE Self-Refresh FSM is similar to power-down FSM with only difference being the memory is put into self-refresh state; it shall refresh itself minimum once while in self-refresh mode. The CKE (clock enable) signal is de-asserted and asserted during entry and exit into this mode. The IO drivers are switched off to save power, additionally the clocks can be gated or operating frequency of memory can be changed during self-refresh mode. The refresh timing counters are held at their current value and continue increment only after the memory is exited from this mode. MRS MODE User can configure into the MRS registers address space specified in Table 6. On asserting start_issue_mrs bit in CSR register 16 OP [1], the memory controller puts memory into MRS mode provided the current request issued is completed and all banks are closed. The memory controller looks for which MRS registers are changed from previous configuration and configure only those MRS registers into the memory thus avoiding reconfiguring the unchanged MRS registers and saving dead cycles. The MRS registers in memory controller are updated after every tMOD time the MRS command is issued thus syncing with the configurations made on memory.
Table 6: MRS Register Configurations AL-HBMMC CSR SETTINGS The AL-HBMMC can be configured with various parameters and targeted for various channel densities and channel modes. The following section describes the compiler directives to set the memory controller into channels modes and densities, various MRS and CSR registers definitions to configure the memory controller and the HBM memory. A separate register interface is used to configure the CSR and MRS registers. User can configure into any of these registers during normal operation or immediately after reset is de-asserted. The AL-HBMMC is designed to support Legacy Mode and Pseudo Mode with varying channel densities from 1Gb to 8Gb. However, it does not support both the channel modes at the same time. The memory controller should be configured to run in legacy mode or pseudo-mode using compiler directives described in Table 7.
Table 7: Compiler Directives for Channel Mode and Channel Density The channel mode and channel density only help the memory controller to understand what type of HBM memory it is communicating with. The memory controller has certain blocks visible only for legacy mode and pseudo mode. Channel density also helps memory controller to understand how many banks are present in the memory so that it can generate physical address and commands accordingly. The user can issue start to the internal initialization FSM of memory controller by asserting the start initialization bit OP[0]. User has to poll for initialization done bit OP[7] which indicates that memory is calibrated by PHY and also the MRS register settings in memory controller match the MRS settings in memory. Initialization and calibration register details are shown in Table 8.
Table 8 Initialization and Calibration Register Configurations User can also issue new MRS settings during normal operation and issue for start of MRS FSM by asserting OP[1]. User has to poll for OP[2] which indicates the memory and memory controller is successfully updated with new MRS settings. Point to be noted is that, when user configures new MRS settings in memory controller it is not reflected onto the behaviour of memory controller until the memory is also updated with new MRS settings. So if data bus inversion is enabled by user the memory controller will not initiate data bus inversion until the MRS settings for data bus inversion is enabled on memory using MRS commands. MRS REGISTERS The MRS registers reflect the same address locations and register definitions as explained in HBM specification. This is to make it easy for the user to configure into the required MRS registers without having to refer to additional memory mapping of MRS register settings for memory controller. MRS registers are mapped from address locations 0 to 15 (Table 6). Registers 9 to 14 are read-only and kept to reflect the address mapping given in HBM specification. User can update into the registers after reset is de-asserted and before the memory controller is put into initialization mode. The user can also update MRS registers during normal read/write operation. These new configurations will be stored in temporary registers and updated in memory controller once they are issued with MRS commands to memory with tMOD time limit reached. POWER-DOWN / SELF REFRESH ENTRY/EXIT CONTROL CSR REGISTERS The memory can be put into power-down or self-refresh mode using OP[1] and OP[0] respectively(Table 9). User has full control to enter or exit the memory in PDE/SRE modes. Refresh FSM outputs : refresh to be issued OP[6] and refresh state OP[7] are read only bits to indicate that if the memory is put into power-down mode by user and a refresh is to be issued to memory then user has to get the memory out of power-down mode. User can poll for OP[7] to check if they memory is still in refresh state (logic 1) or out of it (logic 0). User can also poll for state of memory if PDE/SRE entry or exit is asserted. OP[2] indicates memory is put in PDE mode, likewise OP[3] indicates memory is put in SRE mode. User can exit the memory from PDE/SRE mode by asserting OP[4]. Once memory is out of PDE/SRE mode OP[5] is set high.
Table 9 PDE/SRE/Refresh Register Configurations CSR REGISTERS FOR TIMING PARAMETERS The memory controller can be configured with timing parameter configuration register set implemented within IP (Table 10). The default timing parameters are as per the Samsung Specification for HBM memory with memory density 8 Gb. The memory controller follows these timing parameters for all operations. User should configure these parameters only once before issuing ‘start initialization’ in register 16. NOTE: These parameters can be changed during clock gating or clock-frequency change modes; however current version does not support these modes.
Table 10 Timing Parameter Register Configurations HBM PROTOCOL VERIFICATION IP To verify a design or product that uses the High Bandwidth Memory as the memory component will need a bus functional model driven as per the HBM protocol. The Atria Logic HBM Verification IP (AL-HBM) incorporates the bus functional model which helps user to kick-start verification process instantly. The Atria Logic HBM Verification IP (AL-HBM) is a SystemVerilog (SV) package, which leverages the native strength of SystemVerilog, imported inside a SystemVerilog module, and is pre-verified, configurable and re-useable. This enables the user to speed up test bench development and allows users to focus on testing modules and designs. Integrating the Atria Logic HBM VIP into the existing test bench is simple: just instantiate the VIP (module) in the test bench. The built-in coverage aids the user to write test cases for covering all possible input scenarios. HBM VIP ARCHITECTURE Figure 5: HBM Memory Channel Architecture The Atria Logic High Bandwidth Memory (AL-HBM) verification IP is a reusable, configurable verification component developed using SystemVerilog. The IP offers an easy-to-use verification solution for SoC’s using the HBM Memory Controller. The HBM VIP can be configured to comprise of up to 8 memory channels, each with its own independent interface. Figure 2 shows the architecture of a single memory channel. The VIP’s memory channel monitors the HBM interface and responds to requests (read/write) from the memory controller. The channel has been modelled to be compliant with JEDEC’s JESD235a specification and supports all non-IEEE 1500 port operations that have been detailed in the specification. FEATURES The HBM (High Bandwidth Memory) model has been developed using SystemVerilog and it is compatible with the Host (controller) and ready to be plugged in. Once plugged in, all it requires is the initial configurations to be made and the model is ready to use. Compliant with JEDEC JESD235a specification: The VIP fully supports all non-IEEE 1500 port operations that have been specified in the specification. This includes support for pseudo-channel and legacy modes, various memory size configurations, low power modes, valid mode register configurations, and on-the-fly clock frequency change during low power modes. Protocol and mode register configuration checks: The VIP implements timing checks for all the command-to-command timing parameters. It also implements checks for valid mode register configurations and for other protocols (initialization sequence, low power modes, etc.) to be followed during the course of operation. Functional coverage: Functional coverage has been provided for all of the command-to-command scenarios that can occur during the course of operation. Logs: For each channel of the HBM, the model provides a set of log files that keep track of commands issued, transaction details of reads and writes (including details of strobes, mode register configurations at the time of command, etc.), all the details of initialization sequence(s), and the complete happenings during low power modes. As described above, the HBM VIP enables effective verification of the design under test for the functionalities mentioned in the JEDEC JESD235a specification. The HBM VIP seamlessly integrates with System Verilog or OVM/UVM Verification Environments. The HBM VIP model strictly confines to the protocol mentioned in the specification and maintains a log of all the transactions that happened and reports any violations of the protocol. CONCLUSION The HBM Memory Controller IP is highly efficient, highly configurable single channel memory controller which with its ‘2-command compare and issue’ algorithm reduces number of dead cycles and increases data transfer with HBM memory to achieve high bandwidth. The separation of logic flow for address, write data, read data and register interface makes it area efficient for ASIC implementation.
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |