|
||||||||||
Practical Design and Implementation of a Configurable DDR2 PHYUpdate: GigOptix, Inc. Announces Acquisition of ChipX (November 10, 2009) By Lior Amarilio, ChipXAbstract : As speed and design complexity increases, so does the need for more memory storage. System-on-a-Chip (SoC) designers can choose to embed more memory into the device, at the expense of silicon area and cost. Depending on memory requirements, a more economical approach might be to use off-chip memories. The evolution of Dynamic Random Access Memory (DRAM), which targeted the commodity market during the last couple of decades, provided high performance, cost effective, off-chip memory. However, the drive for simplicity and inexpensive DRAM for the commodity market has left most of the design and protocol complexity in the memory controller and physical interface (PHY) drivers. As a result, SoC physical design teams faced with the need to interface with Double Data Rate2 (DDR2) DRAM, have challenges to overcome in regard to their design expertise, design flow, and Electronic Design Automation (EDA). To reduce the hassles presented to SoC designers by the DDR2 interface, many problems have been resolved by DDR2 PHY IP development. A DDR2 high speed PHY block is almost always developed as a full custom mixed signal design. There are many good reasons for implementing a full custom design, where every cell and every signal route is fully controlled. Such pre-defined, hard designs offer a way to deal with the tight timing budget of DDR2, which is in the range of a few tens of picoseconds. Another reason is the physical dimensions in which this block must fit. This paper presents through examples of the methods selected while performing physical implementation of the IP. The DDR2 interface supports a wide range of interface settings. Start with the type and number of external SDRAM devices, move on to the logical data or address bus widths, and of course the topology of the SDRAM, such as point-to-point or multipoint connections. Each of these options implies a unique physical implementation of the PHY. This paper demonstrates how, with an advanced design process, the same DDR2 PHY IP design can handle different configurations without area or power overhead costs. Selecting Structured ASIC or Hybrid ASIC as the platform for implementing the PHY introduces great flexibilities from both a performance point of view and from silicon area costs for different configurations. However, the design effort spent upfront in order to meet required performance while limited to the Structured ASIC Fabric is significant. This paper expresses a view of the the Structured ASIC advantages and design challenges as they relate to DDR2 PHY design. This paper highlights the basic architecture of the DDR2 interface from the SoC side, showing the controller, master-slave delay-locked loop (DLL), the PHY, and the SSTL I/Os. The paper expresses the different configurations of such an interface with a view of the Structured ASIC adventure. The paper discusses in depth the physical design considerations and provides examples of the adopted approach, and lists a summary of achieved performance. About DDR2 : DDR2 is part of a large family of synchronous DRAM (SDRAM) interface technologies, which in turn is one of many DRAM implementations. DDR2 SDRAM is an evolutionary improvement over its predecessor, DDR SDRAM. This family of memories is the leading off-chip memory solution in the market today, and is used on PC Motherboards. The primary benefit of this type of memory is its ability to read or write two words of data over the wide parallel data bus every clock cycle -- one word for the rising edge of the clock strobe, and a second word on the falling edge of the clock strobe. Hence the name Double Data Rate (DDR) memory. The DDR2 interface is specified by JEDEC to operate at rate of 400–800Mbps, where few suppliers support even higher rates of up to 1066Mbps. The interface has a wide parallel bidirectional data bus, using SSTL1.8 I/Os and a single bidirectional strobe signal per each group of 8 data bits. The strobe signal is not a free running clock, but is transmitted along with the relevant active data. Moreover, logically yet unfortunately, JEDEC has defined the system to shift design complexity into the memory controller and PHY, to keep DRAMs as inexpensive as possible. This mandate left complexity in the development of the DDR2 PHY, resulting in significant challenges in terms of design expertise, design flows, and EDA tools. Figure 1 demonstrates the timing relationship between the DQ (data bus) and DQS (strobe) signals for Read and Write operation data. In the case of writing data, it is the responsibility of the DDR2 PHY to center-align the DQS with DQ while tracking PVT changes, and taking advantage of DLL circuitry and delay lines. In the opposite operation of reading, the DRAM transmits the data and strobe edge-aligned, while it is again the DDR2 PHY responsibility to shift the incoming DQS by 90 degrees relative to DQ, also while tracking PVT changes. Figure 1 Write and Read Operation, Wave View DDR2 PHY Configurability Options Due to the nature of a high-speed interface and the clear desire to control signal integrity effects in order to increase product reliability and yield, many of the DDR2 PHY IP blocks are compiled into optimized hard macros. Typically, each DDR2 PHY is constructed from several hard macro cells. Having a hard-macro library limits the ability to choose a different interface configuration, because a different set of hard macros is used for each configuration. In the case presented here, a set of predefined interface configurations was taken into account while designing the DDR2 PHY architecture. This feature of configurability forces a designer to divide the hard macro into several small hard macros that can be abutted with one another to form the desired configuration, and still meet the tight timing constraints which are imposed by the DDR protocol. In addition, in order to facilitate straightforward connections between on-chip PHY macro building blocks as well as off-chip signals, a physical dimension requirement exists on the hard macros. Using the configurable set of hard macros, the following configuration options are available without any area, power, or speed penalties:
Figure 2 DDR2 PHY Block Diagram The colors in this figure represent the various hard macromacros that form the complete configurable DDR2 PHY. The hard macro PHY block includes:
Physical Implementation Challenges The physical implementation of the DDR2 Interface is divided into two levels. A high level integration is set by constructing a PHY using already built hard macro-cells and placing them adjacent to one another, providing the best power connections and signal integrity. A lower level implementation is the creation of the firmed macro-cells themselves. The challenge of this design approach is implementing a configurable firmed macro-cell that meets the following requirements:
For specific physical dimensions, the location of the I/Os impose, and the abutment of the macro-cells force, a very tight timing constraint to be met. Figure 3 demonstrates one of the timing budget calculations the read path has to meet. Operating at a data transfer rate of 800 Msps does not leave much timing budget. It can be observed that a total theoretical data window of less then 200 ps is left for correctly capturing the data. This small window shrinks further due to the following parameters:
Figure 3 Demonstration of Read Data Window In the next sections, specific steps of the design flow are discussed, and a description of each challenge and an example of the solution chosen to overcome it is presented. Floorplan and Cluster Placement The DDR2 PHY has strict physical dimensions, and the design is constructed from several different and repeatable modules. The designer knows the optimum location of each module inside the fabric. By following few simple steps, it is possible to allocate groups of cells to a cluster and to force the tool to place the cells related to each cluster in a desired location. These steps are:
set cluster [ data create cluster region $m central_cluster "336u 0u 252u 156u" ] In the Data-Byte part of the DDR2 PHY, more than 20 different locations of clusters were defined and implemented. Figure 4 demonstrates the different locations of the different clusters. It can be observed that next to each I/O of DQ the exact same rectangular cluster is defined, in order to be able to repeat one implementation over and over in the same way. Other clusters can be observed on the top row below the I/Os, dedicated for the JTAG boundary scan topology. One cluster is observed for DQS generation and handling along with the masking mechanism. A few more serve for the delay line locations and more. Figure 4 Cluster Locations Abutment Concept Forming the top level DDR2 PHY requires connection of several prebuilt firm-macro-cells. In order to meet the tight timing constraints every macro cell has signal pins that tightly connect to the adjacent cell. All interface signals have strict physical locations on the boundary of the firm macro. The location is defined in the X,Y orientation and also in the specific metal layer and design rules with which each signal is routed. For each input or output signal, the designer has to specify a specific location on the boundary of the macro following these steps:
In addition a special attention has to be placed while designing the power mesh of the top level PHY. A prebuilt partial power mesh exists within each macro-cell, and the top level power mesh is formed by placing all firm macro-cells adjacent to one another. Take into account the width of the power rail, both for VSS and VDD core power, in order to meet the IR-drop requirements. Figure 5 is a zoom-in view of two firm macro-cells, showing the pin locations on the boundary of each one. The location of the pins may be connected in abutment and form a continuous routing path. The wide rails are the power rails of VSS and VDD alternately, are also abutted by placing two hard-macro-cells, to form the top-level power-mesh. Figure 5 Bottom and Top View of Two Hard Macros Showing Abutment Pin Locations DFT Design for testability is one critical factor in today’s SoC most designers tend to leave for last. Due to the delicate design of DDR2 and the tight timing requirements, a dedicated plan for both logic scan insertion and boundary (JTAG) scan has to be made early in the design cycle to ensure high coverage. Two contradicting requirements exist for the firm macro-cell. One is to have the highest manufacturing test coverage, which leads to a high number of scan operability points and adds uncontrolled loads on different paths. The other is for test coverage to have no effect on critical timing paths. In order to solve this dilemma, the scan-chain is not built automatically using the EDA tools, but instead, its order and the specific cells used is predefined by the designer, who performs the following actions:
Clock Mesh, Zero Skew In order to meet the timing requirements presented by the DDR2 interface, a zero skew clock topology is preferred. One effective approach to achieve zero skew on a relatively narrow clock tree is by forming a clock-mesh. A clock mesh is constructed when two or more driver cells are connected in parallel (all inputs of drivers are shorted together and all outputs of drivers are shorted together) to drive a wide metal bus, achieving an extremely low skew (close to zero). Adopting such a topology provides the advantage of achieving very low clock skew. One big drawback caused by using such approach is the lack of ability of the common EDA tool to calculate the timing delay of a mesh accurately. This requires specific circuit simulations using stand alone analog simulation such as SPICE. Since all driver inputs are shorted together, and the same driver outputs are also shorted together, the timing engine is not capable of providing the correct path delay. Moreover, while performing the physical assertion of such a structure, all timing calculations have to be fed back to an external SPICE engine. Inside the firm macro-cells there are several clock-mesh implementations. For each one the following steps are performed:
Figure 6 Circuit example of one clock mesh The firm macro-cells include such clock-mesh structures. All timing paths related to the clock mesh are hand coded and represented accurately in the timing model. Once this structured is implemented the designer doesn’t have to worry about the tools behavior and can freely benefit from this feature. Conclusion Implementing a DDR2 interface from scratch requires significant and delicate design work. The approach of implementing several small firm macro-cells that together can form a variety of DDR2 interfaces, as presented, reduce design time, cost and risk when building a new DDR2 interface. This paper provided guidelines and several design methodologies for the designer who his about to implement such firm macro-cells. Particular attention was given to meeting the high performance required by the DDR2 interface. Today firm macro-cells that are used to build a DDR2 PHY interface are available in the industry and one can relatively easily configure a required solution. Such advanced structures are available at ChipX for integration in its family of Structured ASIC, Hybrid ASIC, and in hard macro form also in Standard Cell products.
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |