Data interface key to future apps

Data interface key to future apps
By Richard Warmke, EE Times
October 7, 2002 (10:28 a.m. EST)
URL: http://www.eetimes.com/story/OEG20021003S0017

By the year 2005, applications such as games, high-end graphics and routers will require chip-to-chip speeds of 10 to 100 Gbytes/second, while IC I/O-related cost and power consumption must remain roughly at the levels we see today. Unfortunately, incremental developments of today's technologies would be unable to achieve that performance/cost ratio.

Today we can achieve 4.25-Gbyte/s data rates using a 128-bit-wide 266-MHz SSTL bus with double-data-rate SDRAM. Increasing that to 34.2 Gbytes/s, however, would require a 512-bit bus operating at 533 MHz. Obviously, a new interface is needed-one that offers a quantum leap in performance while staying within low-cost, high-volume manufacturing constraints.

Some alternative approaches being considered for chip-to-chip communication-such as RapidIO from Motorola Inc. and Mercury Computer Systems Inc. and Advanced Micro Devices Inc.'s HyperTransport-use a source-synchronous handshake in which a ll I/O drivers and receivers transfer and receive data with a timing reference, such as a strobe or clock, that is transmitted with the data. The combined total of the absolute values of driver data-valid time, receiver setup/hold time, clock jitter, data jitter and data-to-strobe timing error-cannot exceed the clock cycle. Unfortunately, it is difficult to proportionately reduce those timing parameters with the clock cycle, especially across interfaces wider than 16 bits. Packetizing the data also increases latency.

PCI Express and Infiniband represent another approach: multiplexing clock and data signals. Multiplexing is able to achieve data frequencies above 3 GHz, and is useful for longer-distance backplane applications, but that approach has its limitations for short distances. For example, the 8-bit/10-bit data/clock encoding involved in multiplexing increases power consumption as well as the silicon area necessary for I/O. Multiplexing the data and clock signals also precludes bidirectional co mmunication on the same line and can result in additional latency.

For their next-generation parallel-interface technology, Rambus engineers developed Yellowstone-a low-cost, parallel, chip-to-chip interface for data transfer between CMOS logic and/or memory chips on a PC board. It uses differential Rambus-signaling level (DRSL), a 200-millivolt differential-signaling solution; that is, plus/minus 100 mV centered on a 1.1-V reference level.

To allow for high data rates, bidirectional bit pairs are connected point to point and are terminated on-chip with a 50-ohm impedance. Unbalanced transmission was rejected because it inevitably entails high simultaneous switching noise, excessive ground bounce, common-mode noise between pins and between routes, crosstalk, low noise margins and high electromagnetic interference.

Same pin count
Differential signaling solves those problems, but doubles the number of pins per bit. Interestingly, however, pin counts on high-density, high-performa nce chips are about the same, regardless of whether one uses single-ended or differential I/Os.

In a high-speed IC that uses single-ended signaling, a ratio approaching 1:1 active pins to power and grounding pins is necessary to resolve the noise issues associated with unbalanced transmission. Because return current is always confined to the bit pair in a differential interface, however, the number of power and ground pins can be reduced. Overall, the total number of pins remains roughly the same.

Another advantage: Data transfer occurs at eight times the speed of an external clock, resulting in octal-data-rate operation. The I/O circuit of each IC that receives the external clock uses a phase-locked loop to generate a 4x internal clock, and data is keyed to both rising and falling edges, so that eight bits per pin pair are sent or received for each cycle of the clock. Address and control signals are sent from controller to memory chip synchronously.

'Margins' shrink
Rambus e ngineers developed several proprietary technologies to deal with ever-decreasing timing margins. Chief among them is a handshake approach called FlexPhase, in which different ICs may be synchronized to the edges of clocks having different timings by using a per-pin phase adjustment. Even different pins within the same IC need not be synchronized to the edges of the external clock. This maximizes Yellowstone's controller timing margins, making high-speed signals easier to capture.

Using FlexPhase technology, all of the I/Os on the master-side IC are equipped with phase adjusters for transmission and reception. Data send/receive timing may be freely adjusted in increments of approximately 1.4 degrees over a 360 degrees range relative to the internal clock edge received by each pin. To reduce chip cost on the slave side, these ICs (generally DRAMs) may opt not to have such phase-adjustment circuits.

During power up, all of the pins on the master-side IC perform a phase scan by carrying out dummy send/receive operations. They then determine the phase relative to the clock edge that will maximize the timing margin for each bit, and set the transmission timing-adjustment and receive timing-adjustment registers to that phase. As a result, all I/Os can transmit and receive at the time that produces the maximum timing margin. In effect, during initialization, the data-valid window of each pin in the actual mass-production system is determined, and then data is transmitted and received at the optimum timing.

See related chart