Beyond DDR2 400: Physical Implementation Challenges in Your SoC Design

David Wallace, Product Marketing Manager, Synopsys, Inc.

Click here to download a PDF version of this article

Overview

High performance DDR2 SDRAM is an increasingly common memory solution for designs requiring improved data bandwidth capabilities, lower power, and enhanced signaling features. However, the benefits of DDR2 SDRAM are coupled with significant physical implementation challenges at data rates above 400 Mbps. Not only does the bit period become extremely small at these data rates, but previously ignored signal integrity effects also must be well understood and managed.

To further compound the problem, memory interface designers who use third-party semiconductor interface intellectual property (IP) cannot assume interoperability among the individual subsystem components. Industry standards do not govern features like power management, clock/data path skew adjust, PHY controller interface, and on-die diagnostics.

Using complete, integrated DDR2 SDRAM memory physical interface IP solutions can significantly reduce the risks associated with combining discrete memory subsystem blocks, such as interoperability and schedule. Packaged as a complete, integrated place-and-route hard macro, or as scalable sub-macros, third-party DDR2 SDRAM PHY IP can deliver predictable 800 Mbps system performance and significantly reduce development time.

DDR2 SDRAM 800 Memory Interface IP Challenges

The benefits of high performance DDR2 SDRAM memories require systems-on-chip (SoC) interface designers to approach memory subsystem integration with attention to detail. As data rates have progressively increased from DDR2 400, DDR2 533, DDR2 667, and now DDR2 800, the complexities associated with the timing and signal integrity of the memory interface has become increasing difficult.

Migrating from 400 Mbps to 800 Mbps DDR2 requires additional engineering effort. Ideally, this migration was planned for when the 400 Mbps application was first implemented. For DDR signaling, increasing frequency reduces total bit time from 2.5 ns to 1.25 ns. This bit time is then evenly divided into a setup-and-hold budget of 625 ps each. Source synchronous timing depends on the uncertainty of the placement of the DQ data edge relative to the DQS strobe edge. Any skew, jitter or uncertainty component will erode the setup and hold margins.

Total timing is composed of three budgets: Transmitter, Interconnect, and Receiver. Nominally, each of these three budgets account for about 33% of the total timing budget. JEDEC managed to scale down the DRAM contributors to the transmitter budget (during Reads) and the receiver budget (during Writes) accordingly with the increase in frequency. Unfortunately, the scaling is not applied proportional to the bit period as data rates increase. For example, the uncertainty of when the DRAM will generate DQS relative to CK is +/- tDQSCK. For DDR2 400, this is +/- 500 ps, or 40% of the 2500 ps bit time. For DDR2 800, this is +/- 350ps, or 56% of the 1250 ps bit time. Assume also, that the system designer planned for this migration and specified the controller, PHY, and I/O cell to meet 800 Mbps timing at the beginning of the project. The remaining timing budget is consumed by the interconnect between the PHY and the DRAM. Three items in particular in the interconnect budget need to be addressed.

PCB and Package Skew â€“ The electrical length differences between the DQS and the DQ of a particular byte must be reduced to meet the now reduced timing budget. Where a 35 ps skew budget may have been adequate for 400 Mbps, less than 20 ps may be required at 800 Mbps.

Inter-Symbol Interference (ISI) â€“ This effect is the overlap of random signal bits at the receiver. ISI is exacerbated by capacitive loading of the net and frequency dependent losses in the channel routing. The impact is to increase the data-dependent jitter at the receiver thresholds and reduce the minimum amplitude of the received signal. Both of these effects can be captured in eye-patterns. When the bit rate doubles, these effects will increase since the signal now has 50% less time to reach the required threshold levels. DDR2 has lower AC thresholds at 800 Mbps that addresses the amplitude issue; however, increased data-dependent jitter will still be a problem.

Fortunately, the capacitive loading can be reduced by decreasing the number of ranks of memory of each DQ and DQS, consequently reducing the roll-off of the received signal and reducing ISI. The PCB losses can be reduced by shortening the overall route length, using a lower loss dielectric material and/ or increasing the trace width (watch out for crosstalk).

SSO pushout â€“ During write operations, the DQS is launched 90 degrees out of phase with the eight DQ signals of the byte. When the 8 DQ lines toggle simultaneously, the resulting current draw through the package wire inductance may cause the power rail to collapse, resulting in a delay of the output of the DQ signals. This â€œpush outâ€ will subtract from the available set up time budget. If nothing is done, when the bit rate doubles, the percentage of the budget occupied by SSO push out will also double. To reduce the contribution of SSO, the package wire inductance must be reduced. This can be done by switching from bond wire to flip chip or increasing the number of power/ground pairs in the interface. Other less effective measures would be using double bonds on power and ground, or adding decoupling. Decoupling is most effective when on-die. Decoupling can also be placed close to the die on the pad ring, within the package or surface-mounted to the package. Adding capacitance on the PCB will likely have too much effective inductance to efficiently decouple high-frequency power/ground noise. Any of these solutions increase the cost; therefore, the architect should plan for 800 Mbps operation from the beginning.

Planning is the key. When designing for 400 Mbps, anticipate what will be required to get to 800 Mbps and include it on the front end.

Evaluating, Acquiring, and Integrating Components

The primary function of DDR2 SDRAM physical interface IP is to meet the signaling and cycle-to-cycle timing constraints of a given DDR2 SDRAM memory subsystem. Traditional physical interface solutions comprised of SSTL_18 I/Os or targeted application-specific DDR2 I/Os, delay lock loops (DLLs), and home grown near-pad logic have provided reliable interface solutions at DDR2 400 rates and lower. More importantly, the effort expended to implement traditional DDR2 SDRAM physical interface solutions at these data rates was not prohibitively costly when compared to the overall end product cost. The DDR2 SDRAM interface, although essential for many SoCs, does not often differentiate the end product. However, at DDR2 533 and higher, the associated interface complexity requires more time, resources, and effort to ensure successful operation with sufficient design margin. Sourcing components from multiple vendors not only adds additional time to the integration effort, it may also limit the ability to reach the desired performance target.

Interdependencies between I/Os, DLLs, and high-speed near pad logic not only affect the overall interface performance, but can also determine the ability to perform critical in-silicon testing, exercise power management modes, and implement intra/inter-domain skew management to maximize timing margin. Unlike PCI Express® or USB 2.0, memory subsystem integration is not governed by a standard that clearly defines how the individual components of the physical subsystems play together as a complete system. Inconsistencies in methodology prescribed by the individual IP component providers and their limited ability to influence the overall system performance further complicates a very challenging signaling environment.

A successful design requires a significant amount of time spent selecting the precise mix of components that address the specific needs of the SoC and its target system. Each discrete component of the memory subsystem must be rigorously characterized across multiple process, voltage, and temperature corners to enable the integrator to accurately model the performance of the collective interface pre-tapeout. The integrator must determine how best to select the proper blocks, balancing features, performance, quality of characterization and integration documentation from SIP memory interface component providers.

Selecting the optimal paring of the physical interface IP and digital controller IP is also important. What are the PHY interface requirements on the controller side? What are the controller interface requirements on the PHY side? Does the controller support on-die diagnostic features of the PHY? How do the controller and PHY enable selectable levels of the power management in the I/Os? Do the individual components utilize similar power sequencing methodologies? Often important features of the individual components cannot be utilized at the system level because they simply were not designed to play together.

A considerable amount of time can be consumed managing the myriad of integration and signaling details implementing high-performance DDR2 SRAM interfaces. Unfortunately, overcoming the integration complexity of DDR2 533, 667, and 800 does not serve as a product differentiator. The interface is expected to work at first silicon in multiple applications and across multiple topologies. How do you balance the costs associated with resources, time, and effort required to implement a high-performance commodity memory interface profitably? How does integrator keep getting the most performance margin out of the DDR2 interface with the least amount of effort?

Complete, Integrated Physical Interface

Using an integrated complete physical interface solution enables an integrator to bind the signaling challenges associated with controlling timing critical delay and skew paths within the physical interface IP memory subsystem. Not only does this represent a notably reduced engineering and integration effort, it also allows SoC providers to implement a complex interface profitably, across multiple designs, with adequate margin.

An integrated, complete physical interface solution provides all necessary components such as I/Os, DLLs, PLLs, and glue logic, delivered as an integrated tapeout ready hard macro. The effort and skill associated with managing the critical timing and skew related issues within the physical subsystem are taken care of within the PHY, saving several man months of valuable engineering time that can be better utilized to differentiate the end product.

Every picosecond of margin counts when implementing DDR2 800. Using an integrated PHY reduces the system performance uncertainties as compared to the traditional discrete building block approach. The IP provider has predetermined the placement of data path logic, address and command logic, functional I/O, and utility pads based on rigorous simulation and test silicon in several memory topologies and multiple packaging technologies. A tightly coupled solution can significantly reduce delay variability on both the write and read paths, which can greatly improve system timing margin. Other features, such as a common address/ command block with integrated PLL that serves as the root of the clock tree, enable the architect to provide realistic system clocks with relaxed duty cycle constraints.

Integrated PHYs can be delivered as a single hardened macro, supporting a fixed data path width (with or without ECC) and specific package technology. Alternatively, the PHY can also be packaged as byte-wide data path tiles, single common address/command path tiles, and functional/utility I/Os accompanied by clearly detailed integration and signal integrity and timing guidelines. Constructing a DDR2 PHY using submodules may involve additional integration effort compared to the single integrated PHY macro. However, using a well architected common address/command module and comprehensive routing guidelines from the IP provider can lessen the added integration effort to basic symmetrical clock tree routing and balancing, while providing the flexibility in die placement.

DDR2 SDRAM physical interface IP â€” packaged as a complete, integrated macro â€” enables predictable performance, increases margins, and considerably reduces implementation and integration time. Beyond high performance and reduced risk, a properly architected DDR2 SDRAM PHY provides system-level testability, power management, and signaling optimization.

Complete DDR2 SDRAM Interface IP Solution

If time is an issue in your project or if you are not comfortable with selecting and integrating high-performance DDR2 SDRAM subsystem IP components, there is an option that allows you to use a proven, integrated solution. Synopsys is uniquely positioned to ensure overall memory system performance with a complete DDR2 SDRAM memory interface solution that includes a scalable digital controller solution and a complete, integrated physical interface hard macro. Unlike PHY or controller-only vendors, Synopsys has developed a dedicated DDR2 SDRAM memory interface solution from a system perspective. Interoperability issues between digital controllers and PHYs that often result in excessively constrained designs with insufficient margin are no longer obstacles limiting the overall system performance. The PHY controller interface is a simple wired interface, optimized for ease-of-integration and minimum latency. A comprehensive set of integration and signal integrity and timing guidelines provides clear established rules to ensure the optimal implementation, analysis, and validation of the complete memory subsystem IP within the target SoC interconnect environment.

The DesignWare® Mixed-Signal DDR2 SDRAM PHY is delivered as a complete tapeout-ready integrated macro, containing all necessary interface cells, utility cells, data path, and address/command path logic. This solution is delivered as a complete place-and-route 32-bit integrated macro and scalable byte-wide data block, common address/command block, and I/Os that support a wide variety of die placement and packaging options. When combined with the DesignWare Cores DDR2 SDRAM Memory Protocol Controller, you can utilize diagnostics features such as at-speed loop-back testing of the PHY in addition to multiple levels of power management within the I/Os.

The complete DDR2 SDRAM IP solution from Synopsys delivers predictable, high-performance memory subsystem interface solutions for todayâ€™s consumer electronics, computing, and communication applications.

A PDF version of this article is available at:
http://www.synopsys.com/products/designware/pdfs/beyond_ddr2-400_wp.pdf

For more information on Synopsys DesignWare IP, visit www.synopsys.com/designware

Industry Articles

Beyond DDR2 400: Physical Implementation Challenges in Your SoC Design