Transceiver design is fully integrated

Transceiver design is fully integrated
By G. Miao, P. Ju, D. Ng, J. Khoury and K. Lakshmikumar , EE Times
September 22, 2003 (11:40 a.m. EST)
URL: http://www.eetimes.com/story/OEG20030919S0043

In this article, we describe a 10.5-Gbit/second to 13.5-Gbit/s transceiver in a 0.13-micron CMOS technology. The transmitter and receiver use half-rate architectures to ease high-speed clock routing and to save power. The transmitter uses a clock multiplication unit (CMU) with a differential LC oscillator to generate a 5.25-GHz to 6.75-GHz low-jitter clock, which multiplexes 32-bit parallel data onto a serial stream at 10.5 Gbits/s to 13.5 Gbits/s.

With a 1.2-volt power supply, the tuning voltage for the voltage-controlled oscillator (VCO) in the CMU circuit is limited to less than 1 V. To achieve more than 30 percent tuning range and to cover process and temperature variations, a VCO with gain over 2 GHz/V would be needed. That high a gain, however, would make the VCO sensitive to coupling from nearby circuits and power supply noise. To overcome those problems, an LC oscillator with coarse- and fine-tuning capabilities was designed. Coarse tuning provides the CMU with a wide tuning range and reduces the VCO gain by five times because the fine-tuning range is only needed to cover temperature and supply variations.

The LC VCO is coarse-tuned by switching capacitors in or out in parallel, with the varactor used for fine-tuning the VCO. Coarse tuning is performed during a power-up calibration sequence. The CMU calibration operates as follows: When powered up, a linear search engine finds the closest VCO coarse-tuning setting, while the loop filter is held at midrange. When the coarse-tuning setting is found, the loop switches to the fine-tuning mode and operates as a conventional phase-lock loop.

The low headroom of a 1.2-V supply also makes the design of a tristate charge pump difficult with a conventional passive series-RC loop filter. The charge pump will operate in the triode region for very high or low voltages on the loop filter, causing the up and down currents to mismatch. Such mismatch results in the appearance of clock r eference spurs in the output clock's spectrum. To overcome this problem, an active loop filter is employed in which the charge pump output voltage is held with a virtual short to half the power supply. A conventional tristate phase-frequency detector (PFD) is used. To reduce the noise generation and coupling in the CMU, the PFD and feedback divider are implemented with current-mode logic (CML).

The transmitter consists of a 32:1 multiplexer with an embedded duo-binary/RZ precoder, a 1:16 clock divider, a prebuffer and a 50-ohm terminated output buffer. For optimal power, speed, design and layout complexity considerations, circuits that operate at or above a clock rate of 1.25 GHz use CML circuits while those operating below 1.25 GHz utilize standard CMOS logic. The precoders are implemented at 2.5 GHz as a compromise between power and design complexity. The duo-binary and preemphasis filter function is embedded in the output buffer.

To ease the current-density problem usually seen in high-spe ed, low-VDD CMOS CML circuits, and to avoid any unnecessary parasitic capacitance in the signal path, a highly parallel architecture is adopted in the final 4-to-1 multiplexer and output buffer. This architecture also provides design flexibility in implementing duo-binary and pre-emphasis transmissions. Specifically, the data path from 2.5 Gbits/s and higher is divided into four equal and parallel branches. The final stage of the output buffer acts like a four-level current-steering D/A converter to sum the outputs of the four parallel signal paths. To increase gain bandwidth, the final 2-to-1 multiplexer, the internal half-rate clock buffer and the half-rate frequency divider all employ inductive peaking.

The receiver consists of three main blocks: a high-gain input buffer, a clock and data recovery (CDR) circuit and the deserializer, which converts the recovered high-speed data to a 32-bit parallel low-speed output. The receiver has two modes of operation: lock-to-reference clock for VCO training a nd lock-to-data and deserialization for the normal mode. During power-up or when the serial input data signal has been lost, the CDR locks to the reference clock. Rather than use a conventional PFD when locking to the reference clock, a digital rotational frequency detector is used.

The rotational detector enables faster locking to the reference clock and permits a transition to lock-to-data mode with reduced transients. Once the VCO frequency is within 250 ppm of the desired value, the CDR is switched to data-locking mode, where the half-rate Alexander-type phase detector is switched in and the rotational detector is switched out. A linear data phase detector, such as a Hogge design, is not used, because at 10-Gbit/s data rates accurate phase measurements are extremely difficult to achieve with reasonable power. Circuit imperfections cause the Hogge phase detector characteristic to deviate from the ideal odd-function characteristic, resulting in static phase error and ultimately degraded bit error r ate performance.

Error immunity

In contrast, a bang-bang type phase detector measures only the sign of the phase error, providing far greater immunity to static phase error. A loss-of-lock detector monitors the VCO frequency and switches back to the reference clock if the VCO deviates by more than 250 ppm. For a half-rate receiver, a VCO with quadrature outputs is needed.

To achieve more than 15-mV peak-to-peak differential sensitivity, a high-gain input buffer is needed. While inductive peaking is an effective bandwidth extension method, chip area is excessive, particularly for a multistage amplifier. Instead, an active-inductor shunt peaking technique was used.

The four-stage buffer achieves a simulated gain of 19 dB and a bandwidth of 8 GHz. One problem with the high-gain input buffer is that its output offset can be excessive. Traditional offset cancellation circuits use either ac coupling between stages or an RC low-pass filter at the output to extract the dc offse t and then feed it back to the input stage for cancellation. Conventional cancellation techniques require high resistance and capacitance to realize a low corner frequency, resulting in a large chip area or the need for off-chip components. And the offset techniques require the input data sequence to be balanced between zeros and ones.

To overcome those problems, a new on-chip offset cancellation technique was developed. Instead of directly extracting the buffer offset from its output, the phase detector decisions are used to measure the offset and-advantageously-the sampling latches. The phase-detector-derived offset is digitally low-pass filtered and fed back to the input stage with a 6-bit current-mode D/A converter to cancel the offset.

The transceiver is laid out for flip-chip packaging and integration with digital VLSI devices; however, for test chip purposes a wire-bond pinout was used. The core 10-Gbit/s transceiver circuit with solder bumps was not modified. Instead, additional on-ch ip power busing and 50- ohms transmission lines were routed from the solder bump pads to the peripheral wire-bond pads. The transmission lines and bond wires introduce signal loss and impedance mismatch, slightly degrading results. The test chip was put in a 118-pin BGA package.

The CMU and CDR VCOs were measured to verify the coarse/fine-tuning capability. The CMU's VCO covers the frequency from 5.1 GHz to over 6.8 GHz with adequate overlapping between coarse-tuning curves. The fine-tuning has more than a 4 percent tuning range to cover the temperature/power supply variations. The CMU 5.44-GHz output has a clock jitter of 8.4 picoseconds peak to peak, while its integrated rms jitter from 50 kHz to 80 MHz is 0.6 ps. The CMU clock phase noise is -103 dBc/Hz at 1-MHz offset. The CDR VCO has similar tuning curves except that its tuning range is from 5.25 GHz to 6.9 GHz.

The transceiver can operate at up to 13.5 Gbits/s. In normal operation, the receiver input sensitivity is 15 mV peak-to-peak di fferential with a 231-1 PRBS input pattern at 10.88 Gbits/s. The recovered clock has 18-ps peak-to-peak jitter and 0.44-ps integrated rms jitter. The entire transceiver consumes 1 watt when fully operational.

G. Miao, P. Ju, D. Ng, J. Khoury and K. Lakshmikumar work for Multilink Technology Corp. (Somerset, N.J.).

See related chart

Industry Articles

Transceiver design is fully integrated