Synchronous static RAM (SRAM) architectures are evolving to support the high-throughput requirements of communications, networking, and digital signal processing (DSP) systems. Previous Sync SRAM architectures such as Std Sync and NoBL SRAM were limited by bandwidth and could not cope with the high-throughput requirements of high-speed applications. DDR-I and DDR-II SRAMs that are a part of the Quad Data Rate (QDR) SRAM family, however, are ideally suited for high-speed networking applications. They support higher memory bandwidth by providing more than two times the bandwidth of prior Sync SRAM architectures by accepting data transactions on both edges of the clock. This makes them an enticing technology for applications that require data transfer at very high speeds. This article describes how to implement a DDR-II SRAM memory device with a Stratix II FPGA, including detailed timing analysis. DDR-II SRAM Overview The DDR-II SRAM has a synchronous interface and can perform two data writes or two data reads per clock cycle. The same bidirectional data bus is used for writing and reading from the SRAM. The device uses three pairs of clocks: Input Clocks K and Kn for latching in the input address, controls and data, optional output clocks C and Cn for output data, and source synchronous ‘echo’ clocks CQ and CQn that are edge aligned with the output data. Write and read operations to the DDR-II SRAM are burst-oriented and support burst lengths of two and four, so each read and write operation transfers either two or four data words (See Figure 1). DDR-II SRAM devices use the 1.5V-HSTL or 1.8V-HSTL Class I/II I/O standard. However, it is recommended to use the 1.8V-HSTL Class I I/O standard for maximum performance in Stratix II devices. Click here for Figure 1 DDR-II SRAM Functionality Burst-of-2 and burst-of-4 devices provide the same overall bandwidth at a given clock speed. This section describes the functionalities of burst-of-2 and burst-of-4 DDR-II SRAM devices. From this point forward, write data to the memory is denoted by D, while read data from the memory is denoted by Q. Burst-of-2 DDR-II SRAM Devices Burst-of-2 DDR-II SRAM devices support two-word data transfers on all write and read transactions, requiring a relatively simple controller implementation. The figures below illustrate write and read operations with the device operating in dual clock mode (i.e., optional C and Cn clocks used). If the device is used in dual clock mode, timing parameters would be with reference to C/Cn, while in single clock mode (i.e., C and Cn clocks not used), timing parameters would be with reference to K/Kn. Click here for Figure 2 The size of the Address, Data-I/O buses depend on the memory device with which the FPGA interfaces. The BWSn signal (used to control byte-level operations) is low for the entire cycle of Figure 2. Write Cycle On the rising edge of the K clock, the DDR-II SRAM device latches the control signals R/W and LD and the write address A2 (Cycle 6 of Figure 2). On the next rising edge of the K clock, the DDR-II SRAM device latches the lower data word (DA2) on DQ and on the subsequent rising edge of the Kn clock, the device latches the upper data word (DA2+1), thus completing a write cycle. Read Cycle On the rising edge of the K clock, the DDR-II SRAM device latches the control signals R/W and LD and the read address A0 (Cycle 2 of Figure 2). After a one-and-a-half-clock-cycle latency, the rising edge of Cn clocks out the lower data word (QA0) of address A onto the DQ bus and the upper data word (QA0+1) on the next rising edge of the C signal, completing the read cycle. Burst-of-4 DDR-II SRAM Devices Burst-of-4 DDR-II SRAM devices support four-word data transfers on all writes and reads, reducing address bus activity. However, the control circuitry needed to interface to burst-of-4 DDR-II SRAM devices is more complicated than control circuitry for burst-of-2 DDR-II SRAM devices. Click here for Figure 3 Write Cycle The DDR-II SRAM device latches the control signals LD and R/W and the write address A2 (See Cycle 8 of Figure 3) on the rising edge of the K clock. On the following K clock rising edge, the DDR-II SRAM device latches the first data word (DA2) on DQ. On the next Kn clock rising edge, the second data word is latched (DA2+1). The third (DA2+2) and fourth (DA2+3) words are latched in on the subsequent K and Kn clock rising edges, respectively, completing a write cycle. Read Cycle The DDR-II SRAM device latches the control signals LD and R/W and the read address A0 (Cycle 2 in Figure 3) on the rising edge of the K clock. After a one-and-a-half-clock-cycle latency, the rising edge of Cn clocks out the first data word (QA0) of address A0 onto the DQ bus. The next rising edge of C clocks out the second data word (QA0+1). The subsequent rising edges of Cn and C clock out the third (QA0+2) and fourth (QA0+3) words, respectively, completing a read cycle. DDR-II SRAM Interface Signals Table 1 shows the DDR-II SRAM interface pins (i.e., clock, control, address, and data signals) and how to connect them to Stratix II devices. When interfacing with one DDR-II SRAM device, it is recommended to use a single-clock scheme where the DDR-II SRAM device's C and Cn port is tied to VDD (Single Clock Mode) Click here for Table 1 Clock Signals DDR-II SRAM devices have three pairs of clocks: Input clocks K and Kn Output clocks C and Cn Echo clocks CQ and CQn The positive input clock, K, is the logical complement of the negative input clock, Kn. Similarly, C and CQ are complements of Cn and CQn, respectively. The DDR-II SRAM device uses the K and Kn clocks for write accesses and the optional C and Cn clocks for read accesses, if used. CQ and CQn are the source synchronous output clocks from the DDR-II SRAM device to accompany the read data. The number of loads that the K and Kn clocks drive affects the switching times of these outputs. When a controller drives a single DDR-II SRAM device, C and Cn are unnecessary because propagation delays from the controller to the DDR-II SRAM device and back are the same. To reduce the number of loads on the clock traces, DDR-II SRAM devices also have a single-clock mode, where the K and Kn clocks are used for both reads and writes. In this mode, the C and Cn clocks are tied to the supply voltage (VDD). The DDR-II SRAM device still uses CQ and CQn for the echo clock from the memory device to the Stratix II device. The Stratix II device outputs the K and Kn clocks and the data, address, and command lines to the DDR-II SRAM device. For the controller to operate properly, the write data (D), address (A), and control signal (R/W, LD, BWSn) trace lengths and their propagation times should be approximately equal to the trace lengths of K and Kn clocks and their propagation times. If the propagation delays for K and Kn from the FPGA to the DDR-II SRAM device are equal to the delays on the address (A) signals, the signal skew effect on the write and read request operations is minimized. The delay matching between write data (D) and K/Kn clocks is achieved by using identical double date rate output circuits to generate the clock and write data inputs to the memory. The DDR-II SRAM device generates echo clocks CQ and CQn, which are edge-aligned with the leading edge of the read data. The CQ and CQn signals are then phase-shifted inside the Stratix II device and used to capture the read data. The CQ and CQn signal board trace length between the DDR-II SRAM device and the controller should be equal to the data I/O (DQ) board trace length to minimize the skew between the two signals. For Stratix II interfaces to DDR-II SRAMs, connect the CQ and CQn pins to the FPGA DQS and DQSn pins, respectively. Both phase-shifted CQ and CQn signals are used to capture the read data. The CQ pin is connected to the input latch and the active-high input register, while the CQn pin is connected to the active-low input register. For best data alignment, invert the CQ and CQn signals before they arrive at the DQ IOE registers. This option can be selected in the altdq megafunction. See the External Memory Interfaces chapter of the Stratix II Device Handbook for more information www.altera.com (Volume 2 Chapter 3). Use regular I/O pins in Stratix II I/O banks 3, 4, 7, or 8 via the double data rate (DDR) registers to generate the K and Kn clocks. To meet the DDR-II tKHKH(skew between K and Kn) requirement, use adjacent pins for the complementary signals and surround the pin-pair with programmable VDD and ground pins for better noise immunity. Data Signals DDR-II SRAM devices use bidirectional data buses, for writes and reads (DQ). Connect DQ pins on the SRAM to the DQ pins on the Stratix II FPGA. Any of the FPGA user I/O pins in I/O banks 3, 4, 7 or 8 can be used to connect to the DQ ports. Control Signals DDR-II SRAM devices use the R/W signal to indicate write and read operations, while the Synchronous Load (LD) is used to indicate the start of the operation. The byte write select signal (BWSn) is a third control signal that tells the DDR-II SRAM device which byte to write into or read from the DDR-II SRAM device. Any of the FPGA user I/O pins in I/O banks 3, 4, 7 or 8 can be used to generate control signals. Address Signals DDR-II SRAM devices use one address bus (A) for both read and write addresses. Any of the FPGA user I/O pins in I/O banks 3, 4, 7 or 8 can be used to generate address signals. DDR-II SRAM Interface Architecture For the write implementation, a write PLL is used to generate the write data (D) and center aligned system clocks (K and Kn) using the dedicated DDR I/O circuits. This implementation results in matched propagation delays for clock and data signals from the FPGA to the DDR-II SRAM, minimizing skew. For the read implementation, the enhanced DLL and delay shift circuitry are used to center align the echo clocks (CQ and CQn) with read data (Q). Datapath Architecture in Stratix II The DDR-II implementation in Stratix II uses two PLLs: - A write PLL generates K/Kn system clocks and clock out address, command, and data. - A read DLL-based phase shift circuitry registers read data from the memory using echo clocks CQ/CQn. Click here for Figure 4 Figure 4 depicts the memory interface datapath architecture. Specifically, it indicates how to connect the clocks, data, address, and control pins in Stratix II devices when interfacing with DDR-II SRAM devices. The write PLL generates two clock outputs, WRITE_CLK and WRITE_CLK_90 that have a 90 phase offset. The WRITE_CLK output is used to clock out the address, command, and data signals to the DDR-II SRAM, while the WRITE_CLK_90 output is used to generate the K/Kn memory input clocks. This architecture centrally aligns the K and Kn write clock edges to the output data (D) and address (A) signals. Write data outputs to the memory and the clocks use the double-data rate registers or DDIO circuitry in the IO cell, significantly minimizing the skew between clock and data channels. The read DQS phase shift circuitry generates a centrally aligned version of CQ and CQn echo clocks for read data capture. The captured data can then be resynchronized to the system clock. For more information on how to select the correct resynchronization clock phase, see the appendix Resynchronization of Read Data to the System Clock in the QDRII SRAM Controller MegaCore' Function User Guide www.altera.com. Timing Analysis Since data is transferred between the FPGA memory controller and the DDR-II SRAM device at high speeds, it is imperative to avoid set-up or hold violations for the DDR-II SRAM and the FPGA. This section illustrates the timing analysis that must be performed when designing a high-speed DDR-II SRAM interface. Write Cycle Timing It is essential to meet the DDR-II SRAM device set-up and hold requirements for correct write cycle timing. For example, the data set-up and hold specifications for the Cypress burst-of-2 267-MHz devices are 0.35 ns each. The FPGA controller drives both the DDR-II SRAM clock and data signals. The board delays for the clock and data (DQ) lines may not be equal and hence, to offset any mismatch in trace lengths, a factor of 50 ps is considered in the clock-to-output delay calculations. Because K and Kn are generated from the WRITE_CLK_90 signal, while data and address are generated from the WRITE_CLOCK signal, there is a timing margin of approximately one-half of the bit period (the length of time between each data bit) each way to meet the DDR-II SRAM device set-up and hold times. The bit period, by definition, is approximately one-half of the cycle time for double data rate signaling. Click here for Figure 5 In addition to set-up and hold times, an additional concern is the clock-to-clock skew between K and Kn (tKHKH). The 267-MHz DDR-II SRAM specification calls for a minimum 1.8 ns delay between the rising edges of the K and Kn signals. Because Stratix II device clock-to-out times can vary with pin position, K and Kn need to be placed on adjacent pins and their tCO times need to be verified to meet this requirement. For better noise immunity, it is recommended to surround the pin pair with programmable VDD and ground pins. In the following exercise, we analyze the timing for a write operation from a Stratix II EP2S60 device to a Cypress CY7C1518AV18-267 burst-of-2 267-MHz DDR-II SRAM device. Let us start the timing analysis by studying the input clocks K and Kn. These clocks are generated by the WRITE_CLK_90 output of the PLL inside the FPGA. The data, address, and command outputs are clocked out by a different output of the same PLL. Since two outputs of a PLL feeding global clock networks have an inherent skew, the K and Kn clocks could be offset from the data outputs by this amount. For the Stratix Enhanced PLLs, skew between two PLL outputs using different counters is 150 ps. This specification is listed in the DC and Switching Characteristics chapter in the Stratix II Device Handbook (Volume 1 Chapter 4). Figure 6 illustrates this and other uncertainties on the clock and data signals Click here for Figure 6 This results in a minimum phase offset between these two clocks: TSHIFT_MIN = (0.25 * clock period) — clock skew = 0.25 * 3750 — 150 = 787.5 ps Similarly, the maximum phase offset between the two PLL output clocks: TSHIFT_MAX = (0.25 * clock period) + clock skew = 0.25 * 3750 + 150 = 1087.5 ps In addition to this clock skew uncertainty, PLL outputs can have duty cycle distortion (DCD) up to 5% of the clock period. This results in an additional clock uncertainty of 187.5 ps (5% of 267-MHz clock). Another source of uncertainty on the clock is PLL jitter. However, since PLL jitter affects both the clock and data outputs to the memory uniformly, it does not affect the set-up/hold relationship on the DDR-II SRAM. In Figure 6, for example, if the ideal clock edge of WRITE_CLK_90 is expected at time t = 3750 ps. After accounting for PLL output clock skew and duty cycle distortion, the clock edge can occur anytime between t = 3412.5 ps and t = 4087.5 ps. Next, we compute the uncertainties on the data (D) signals. Channel-to-channel skew among all data pins is equal to the worst-case skew between the DDR outputs within the I/O bank(s). When using a single column I/O bank in the EP2S60 devices, the worst-case skew is tIOSKEW = 160 ps. Additionally, board trace length variations could add to this channel-to-channel skew. While this implementation calls for perfectly matched trace lengths, the timing analysis allows for 50 ps of board skew. These skew parameters affect the data valid window on the DDR-II memory and reduce it by 420 ps. Now that the uncertainties are established, we check the set-up and hold time margins for write operations at the memory input pins. For a 267-MHz operation, the bit period is (3750 ps/2) = 1875 ps. The Cypress DDR-II SRAM device has set-up and hold time requirements of 350 ps at this speed. Given these parameters, the set-up and hold margins for 267-MHz DDR-II in Stratix are as follows: Set-up time margin is the least when the data arrives late and the clock arrives early. Set-up time margin is calculated as: TSU_MARGIN = tSHIFT_MIN — t,DS — tDCD — tIOSKEW — tEXT = 787.5 — 350 — 187.5 — 160 — 50 = 40 ps Hold time margin is the least when the data arrives early and clock arrives late. The margin is calculated as: TH_MARGIN = tCK / 2 — tSHIFT_MAX — tDH — tDCD — tIOSKEW — tEXT = 1875 — 1087.5 — 350 — 187.5 — 160 — 50 = 40 ps The total margin available is the sum of the set-up and hold margins = 80 ps. Table 2 shows timing margins of a Stratix II EP2S60 interfacing with 267-MHz and 250-MHz DDR-II SRAMs for write operations when the board trace variations for the DQ and K/Kn pins are 50 ps (approximately 0.3” of FR4 trace length variations). A similar timing analysis for other interfaces can be performed with a different FPGA and DDR-II SRAM device combination by replacing timing specifications from the corresponding data sheets. Click here for Table 2 Read Cycle Timing The FPGA controller sends the read request and address signals to the DDR-II SRAM device along with the K and Kn clocks in a similar manner to the write data. Therefore, the write timing parameters apply to these signals as well. Additionally, when the DDR-II SRAM device sends read data back to the FPGA controller, the design must meet the FPGA set-up and hold times. Click here for Figure 7 Stratix II Read Cycle Timing DDR-II memory reads in Stratix II devices are implemented using the CQ echo clock output from the DDR-II SRAM. The CQ echo clock signal is directly fed into a DLL to centrally align the clock with the input data (DQ). This is achieved by implementing a phase shift on the DLL and using this phase-shifted clock to latch data from memory in the DDIO registers. In the following exercise, the timing for a read operation from a Cypress CY7C1518V18-267 burst-of-2 267-MHz DDR-II SRAM device to the Stratix EP2S60 device is analyzed. Start the analysis by studying the relationship between the echo clocks (CQ, CQn) and read data (DQ) signals from the DDR-II SRAM. For the CY7C1518V18-267, the data clock-to-output (tCQD) and data hold times (tCQDOH) with respect to the echo clocks are 300 ps and — 300 ps, respectively. Hence the data valid window at the DDR-II SRAM device pins is 1275 ps (1875 — 300 — 300). Figure 8 illustrates these delays and other uncertainties in a read cycle timing waveform. Click here for Figure 8 Board trace delays on the CQ/CQn signals and data bus can be ignored if the trace lengths are matched (=L2 in Figure 4). This timing analysis allows for a maximum board skew of 50 ps between these lines. Due to this skew, the data valid window is further reduced to 1175 ps. The next step is to analyze the set-up and hold margins for latching the read data (DQ) signals at the FPGA's DDR input pins. The echo clock, CQ, from the DDR-II SRAM is connected to the dedicated reference clock input pin of the DLL. This read DLL phase shifts the clock to centrally align the clock's edges to the data (default phase shift of 90°). Uncertainty is introduced on read clock by the Stratix II DLL in the form of jitter (100 ps). Worst-case set-up and hold time requirements from the Stratix EP2S60 are 210 ps and 180 ps, respectively. These numbers were obtained from Quartus II timing analyzer reports. While performing timing analysis for a specific design, obtain the requirements from the Quartus II timing analyzer. Additional FPGA specifications that need to be taken into consideration are DLL phase shift error and DQS-DQ internal skew. Given these parameters, the set-up and hold margins for 267-MHz DDR-II in Stratix II are as follows: Set-up time margin is the least when the data arrives late and the clock arrives early. Set-up time margin is calculated as: TSU_MARGIN = tDLL_PS — tJITTER — tPSERR — tDQDINT — tEXT — t — t CQD = 937.5 – 50 — 0 — 80 — 50 — 210 — 300 = 247.5 ps Hold time margin is the least when the data arrives early and clock arrives late. The margin is calculated as: tH_MARGIN = tCK/2 — tCQDOH — tH — tEXT — tDLL_PS — tJITTER — tPSERR — tDQDINT = 1875 — 300 — 180 — 50 — 937.5 — 50 — 0 — 80 = 277.5 ps The total margin available is the sum of the set-up and hold margins = 525 ps. Since the hold margin is larger than the set-up margin, the PLL phase shift can be adjusted to balance the margins. An additional phase shift of 15 ps to the existing 90 or 937.5 ps phase shift would result in equal margins. This amounts to a total real PLL phase shift of 91 on the echo clock Table 3 features the DDR-II SRAM read timing margin analysis at 267 MHz when the board trace variations for the Q and CQ/CQn pins are 50 ps (approximately 0.3” of FR4 trace length variations). A similar timing analysis can be performed for an interface with another FPGA-DDR-II SRAM device combination by replacing timing specifications in Table 3 with those from corresponding data sheets. Click here for Table 3 Design Guidelines The following guidelines are recommended for DDR-II interface implementation: I/O Standard and Termination 1.8V or 1.5V HSTL I/O standard with Class I termination is recommended for best performance. Impedance Matching The recommended value is to a 50-ohm impedance matching. If higher drive strength is needed on the outputs, minimum impedance mode can be used with termination at the far end. This will be adequate for memory interface operation at the highest supported speed for a particular device density and speed grade. Trace Lengths As described in previous sections, trace lengths for address and control lines should be matched. Likewise, trace lengths for echo clocks and data lines should be closely matched. Clamshell Configuration DDR-II SRAM pinout supports clamshell configuration, where two DDR-II devices can be placed on either side of the printed circuit board. Conclusion DDR-II SRAM devices offer enhanced timing margin and flexibility over prior synchronous SRAMs. Designed for high-bandwidth communications, networking, and DSP applications, DDR-II SRAM devices and Altera’s Stratix II, Stratix, and Stratix GX devices help communications system designers take advantage of DDR-II SRAM technology and achieve high memory bandwidth through a simple proven interface. |