Achieving 200-400GE network buffer speeds with a serial-memory coprocessor architecture

Michael Miller, MoSys Inc.
embedded.com (October 25, 2014)

As network line rates and packet rates are increasing, the need for high efficiency, reduced latency, fine granularity interfaces to memory and coprocessors has become critical. Buffer traffic at 400GE will require approximately 900 I/O pins at 3.2 Gbps to DDR4 memory. Any additional off-chip memory operations for header processing would require as many more pins again.

Many designers will try to integrate all the memory on chip, and this will put challenges on how much computing resources can also be included on the same die in the face of requirements to improve computation by 4x. Even with advanced packaging, these pin counts are not achievable when the line interface and power pins are included. I/O pins exact a cost not only in larger packages, die area, but also power. Being I/O efficient is an important aspect of today’s architecture. Protocols play a large role in efficiency of information transfer.

However, currently available device-to-device serial interfaces used to deal with such latencies suffer from several shortcomings including channelized one-way transport or they target specific applications, such as memory. These interfaces may also be optimized for large data packets, with the result that they may suffer from inefficiency due to the structure of their transactions with an ASIC or FPGA. Inefficiency arises because load/store transactions to and from memory occur in small synchronous transfers of data, such as 72 bits (64 bits + 8 bits of Electronic Dispersion Compensation). Such inefficiency incurs costs in the form of extra memory, additional traces, and therefore increased board real estate.

To meet these challenges, MoSys has developed a reliable serial chip-to-chip transport protocol that operates over OIF standard CEI SerDes and achieves 90% efficiency. The protocol, called the GigaChip Interface (GCI), can be scaled to 1, 2, 4 or 8 SerDes lanes as well as multiples of 8s. It targets computational and memory solutions with serial interfaces for networking equipment such as the Bandwidth Engine. Operating on existing devices with 16 lanes at 15 Gbps, the GCI provides enough bandwidth to support 4.5B read/write transactions and sufficient bandwidth to buffer full duplex 200GE. Doubling the pins or doubling the line rate (30Gbps) achieves full duplex 400GE.
After briefly describing the two most common transmission protocols – packets and data word translations - this article will provide details of the GCI protocol and its various layers and show how it can be used to achieve performance improvements in a typical system.

Click here to read more ...