Take a scalable approach to fabric interface controller design

David Ridgeway, Altera Corp. and Anthony Dalleggio, Modelware Inc.
May 05, 2005 (3:00 AM)

FPGAs and structured ASICs offer flexible reuse and time-to-market advantage over traditional ASIC designs while maximizing performance and cost. FPGAs and proprietary modular intellectual property modules can be used in a commercial-off-the-shelf (COTS) Advanced Telecom Computing Architecture (AdvancedTCA) platform to develop scalable fabric interface controllers (FICs) that accelerate product development and enable robust, cost-effective line-card solutions.

Today's communication and computing system makers are designing next-generation platforms around modular system architectures to help shorten development cycles, reduce capital expenses for new equipment, and minimize operating expenses when adding new features and services. Modular platforms allow equipment manufacturers to deliver multiple types of systems from a common set of building blocks, achieving economies of scale necessary to remain competitive.

An essential step towards achieving modularity is to align equipment makers behind a common set of physical interconnect standards for boards and chassis. AdvancedTCA is a system form factor defined by the PCI Industrial Computers Manufacturers Group (PICMG) to provide standard specifications for platform elements such as chassis dimensions, line cards, I/O modules, fabric interfaces (star and mesh fabric topologies), power ratings, etc. The main objectives of the AdvancedTCA standard is to provide a standards-based hardware platform made up of a chassis and a combination of storage blades, network processor cards, control plane blades, and management modules to build modular, carrier-grade products targeting converged telecom access and edge equipment platforms.

Defining the set of industry standards for the AdvancedTCA backplane interface gives system integrators more flexibility and interoperability between the switch fabric cards and the line cards they interconnect. The AdvancedTCA fabric interface is protocol agnostic and provides for interoperable boards using sub-specifications known as PICMG 3.1-3.5. These sub-specs define support for Ethernet, Fibre Channel, Infiniband, PCI Express, StarFabric, Advanced Switch Interconnect (ASI), and serial RapidIO. The move by some large OEMs to adopt the AdvancedTCA spec marks a shift from custom, proprietary, interconnect-based platforms to COTS, open standards-based platforms.

PCI Express and ASI
System scalability and modularity demands a common interconnect to support the seamless integration of silicon and/or subsystems in multiple applications. As backplane performance scales from 40 to 160 to 320 Gbits/s, care must but taken to insure that the interface between the switch fabric and its traffic source doesn't become a bottleneck. The fabric interface must stream traffic efficiently from 2.5 Gbits/s to more than 10 Gbits/s with good signal integrity while supporting key fabric requirements such as data throughput, flow control, and per-flow queuing.

PCI Express and ASI are two standard switch-fabric technologies that have the potential to dramatically increase the market opportunity for standard, off-the-shelf switches and fabric interface devices. PCI Express leverages the economies of scale in terms of manufacturing, support, and product development across both the computing and the communications ecosystems. Moving PCI Express to a serial interconnect provides many benefits, including physical and performance scalability, improved reliability, full duplex transmission, and simpler, less expensive routing and cabling.

ASI enhances PCI Express by defining compatible extensions that address requirements such as peer-to-peer support, QoS, multicasting, and support for multi-protocol encapsulation. Although PCI Express and ASI are complementary protocols, many systems use both to satisfy designs requirements that aren't realizable today. As new framers, network processing units (NPUs), and switch fabrics adopt ASI, there will be a need to bridge between ASI and other interface specs, such as SPI-3, SPI-4.2, and CSIX. This bridging function can be conveniently integrated with the fabric interface controller.

FIC architecture
The features of an SPI-4.2 to ASI controller (Fig. 1) include:

bidirectional ASI to SPI-4.2 bridge, scalable from 2.5 to 20 Gbits/s (x1, x4, or x8 Lanes)
assembles and disassembles ASI transaction layer packets (TLPs) for endpoints and bridges
supports 1 to 64,000 connection queues (CQs)
supports up to 16 channels on SPI-4.2
programmable channel mapping to SPI-4.2
- supports one bypassable, three ordered, and one multicast virtual channels (VCs)
- programmable max packet length of 64 to 80 bytes
- link-layer credit-based flow control
- CRC generation and error checking
- handles continuous back-to-back end of packets (EOP)
- DIP-4 parity generation and checking
- status channel framing and DIP-2 generation and checking
- loss of status synchronization generation and detection
- training sequence generation and detection
- fully synchronous design: 800 Mbits/s
- OIF-compliant SPI-4 phase 2
- ASI-SIG-compliant, ASI Core Architecture Specification Revision 1.0, Oct. 2003
Fig. 1: A functional view of an SPI-4.2 to ASI fabric interface controller is shown. On the left side is the SPI 4.2 interface to the NPU and on the right is the ASI interface to the switch fabric.

In the SPI-4.2 to ASI direction, ingress SPI-4.2 packets are segmented if necessary and mapped to VC FIFO buffers according to the traffic type (unicast or multicast) and class. The user programs the channel mapping information for the buffer to SPI-4.2 interface in a SPI-4-to-VC mapping table. The packets on the interface are transferred to the corresponding buffers as indicated in the table. The ASI scheduler reads the queues and sends the TLPs to the switch fabric.

The fill level of each of the SPI-4.2 Channel FIFO buffers is translated into a starving-hungry-satisfied status and transmitted through the receive status channel (RSTAT) to the peer SPI-4.2 transmitter. Packets received on the SPI-4.2 interface are transferred to the corresponding VC FIFO buffer when space is available.

The SPI-4.2 and each VC support up to 16 channels (channels 0 through 15). A sample channel assignment from SPI-4.2 to VC is:
- SPI-4.2 channels 0 to 7 are mapped to eight bypassable virtual channels (BVCs)
- SPI-4.2 channels 8 to 11 are mapped to four ordered virtual channels (OVCs)
- SPI-4.2 channels 12 to 15 are mapped to four multicast virtual channels (MVCs)
ASI to SPI-4.2 egress packet flow
In the ASI to SPI-4.2 direction, the egress ASI TLPs from the switch fabric of a given VC and traffic class are mapped to one of the 16 SPI-4.2 channels using a programmable address mapping table (Fig. 2). The user programs the channel-mapping information for the VCs to the SPI-4.2 interface in the VC-to-SPI-4 table. The data-multiplex (MUX) calendar RAM (VCS4 calendar RAM) contains the schedule for reading data from the VC interface FIFO buffers to be transferred to the SPI-4.2 interface. The VCS4 calendar RAM has 16 locations.

The VCS4 data-MUX and address mapping block reads data from the VC FIFO channels according to the order specified in the VCS4 calendar RAM. The SPI 4.2 source block de-queues and reassembles the packets if necessary, adds the SPI-4.2 payload control works, and transmits them on the SPI-4.2 interface to the NPU. The SPI-4.2 source block also performs the credit management and scheduling according to the flow control information received from the peer SPI-4.2 receiver.

ASI provides a number of protocol interfaces (PIs) that provide optional functionality or adapt different protocols to the ASI infrastructure.

Fig. 2: Pictured is a TLP including the ASI header, optional PI-0 and PI-1 headers, and a PI-2 header.

Protocol interface descriptions
PI-0 encapsulation is used for multicast routing. A secondary PI of 0 indicates a spanning tree packet. A non-zero secondary PI indicates multicast routing. Multicast group addressing is accomplished through the multicast group index field.

PI-1 conveys connection queue identification information to the downstream peer switch element or end point. In case of congestion, the downstream peer can send a PI-5 congestion management message identifying the offending connection queue in the upstream peer.

PI-2 provides segmentation and reassembly (SAR) services and encapsulation. The PI-2 header includes start-of-packet (SOP) and end-of-packet (EOP) information that facilitate packet delineation. In addition, PI-2 encapsulation provides for optional pre-pad (PPD) and end-pad (EPD) bytes that can align the payload data within the PI-2 container.

PI-2 encapsulation can simply be used to delineate packets and to map data flows to contexts if the SPI 4.2 burst size and the ASI TLP payload size are equal (Fig. 3). In this case, the received SPI 4.2 bursts are already segmented into a payload size that's supported on the ASI interface. Therefore, from a packet delineation point of view, PI-2 only needs to indicate the SOP and EOP.

Fig. 3: Shown is an example of PI-2 encapsulation. The initial SPI-4.2 burst is transformed into an ASI TLP by removing the SPI-4.2 protocol control word (PCW) and adding the ASI header, along with the optional PI-0 and PI-1 headers, and the PI-2 header.

For the initial TLP, the PPD field isn't used if the data is packed in the ASI payload starting with the most significant byte of the first word. The PI-2 SAR code is set to indicate "initial."

For the intermediate burst, the PI-2 SAR code is "intermediate." Note that because a non-EOP SPI-4.2 burst must be a multiple of 16 bytes, mid-packet SPI-4.2 payloads will always be 32-bit aligned, matching the ASI payload.

For the terminal burst, the PI-2 SAR code is "terminal" if all bytes are valid in the last TLP word or "terminal with end pad," to indicate the number of valid bytes in the last word.

PI-2 SARing is used to segment and reassemble SPI-4.2 packets, if the SPI-4.2 burst size is larger than the ASI TLP payload size. The received SPI-4.2 bursts are segmented within the bridge into a payload size that's supported on the ASI interface (Fig. 4).

Fig. 4: In an example of PI-2 segmentation, the SPI-4.2 packet is segmented into three ASI TLPs. The SPI-4.2 protocol control word is removed and an ASI header is added with the optional PI-0 and PI-1 headers, and the PI-2 header for each TLP.

As for encapsulation, the PI-2 SAR codes for the three TLPs are set to indicate "initial," "intermediate," and "terminal" or "terminal with end pad," respectively. For reassembly, AS fragments from each context are reassembled into complete packets. Once a complete packet is available, it's mapped to a SPI-4.2 channel and output in bursts. Bursts from different SPI-4.2 channels can be interleaved.

Mapping traffic type, class and destination
A switch interface must transfer a number of essential attributes along with the data. These attributes include traffic type (unicast/multicast) and class, destination port, and congestion management. These parameters are supported in AS. However, in SPI-4.2, this information is mapped in the SPI-4.2 channel number or in a proprietary header within the SPI-4.2 payload.

SPI-4.2 uses credit-based flow control with a three-level congestion indication (starving, hungry, and satisfied). Credits are replenished at the transmitter by preset amounts (Maxburst1 and Maxburst2) which correspond to the starving and hungry indications respectively.

ASI has multiple flow control options: VCs, which are credit-based flow control; token buckets, for source rate control; and per class or flow queue, for status-based flow control.

Congestion management in a bridge is an integral part of the bridge's architecture and buffering scheme. Bridges can use two basic architectures, either flow-through with little or no buffering, or buffered with single- or two-stage buffering (per interface).

In a flow-through architecture, flow control information is generated and acted upon externally to the bridge. This approach simplifies the bridge, but increases latency between the source and the destination of the flow control, possibly requiring additional buffering resources.

In a buffered architecture, flow control information is acted upon by the bridge itself, hence the need for internal buffering. The internal bridge buffering can be shared by both interfaces (single-stage) or each interface can have its own associated buffer, known as two-stage buffering.

The ingress network processor receive port is configured as SPI-4 for the physical device interface, and the transmit port is configured as SPI-4.2 for the switch fabric interface to the proprietary FIC (Fig. 5). The FIC supports a full-duplex SPI-4.2 interface and up to 24 full-duplex 2.5-Gbit/s full-duplex PCI Express SERDES links. For a 10-Gbit/s full-duplex link port, four SERDES links are needed. The SERDES of any unused links can be powered down through device configuration register settings. In this 10-Gbit/s example, the NPU configures the EP1SGX40's internal configuration-and-status registers through the PCI local bus interface.

Fig. 5: The following figure shows two network processors in a typical single 10-Gbps port, full-duplex line card with a proprietary FIC.
Proprietary FIC reference design
A proprietary FIC reference design has been developed and verified using Intel's IXDP2401 Advanced Development Platform. The AdvancedTCA chassis interconnects two IXMB2401 network-processor carrier cards across the AdvancedTCA high-speed fabric interface. The carrier card is a PICMG3.x-compliant board designed with one IXP2400 processor. The card has a modular architecture and includes four mezzanine sites and an optional fabric-interface mezzanine site to connect to the zone-2 fabric-interface pins on the AdvancedTCA backplane.
A proprietary FPGA-based fabric interface mezzanine card has been designed to fit onto the carrier card and to provide a reconfigurable FIC and optional traffic-management development board. The FIC interfaces the processor to the AdvancedTCA switch fabric. Using reprogrammable devices with multiple channels of PCI Express- and XAUI-compatible transceivers provides a scalable development platform to rapidly design and verify 2.5- to 10-Gbit/s AdvancedTCA FIC designs (Fig. 6).

Fig. 6: A functional block diagram is shown here.

Operational modes
The reference design's primary operational mode accepts 32-bit SPI-3 or 16-bit SPI-4.2 data from the processor's ingress port, transfers the traffic through the FPGA integrated transceiver to the AdvancedTCA backplane, and returns the backplane traffic through the 32-bit SPI-3 or 16-bit SPI-4.2 interface to the processor's egress port.

The integrated transceiver is configured through the processor's SlowPort egress interface. The reference design supports several other modes of operation, including SPI-4.2 interface loopback, ASI interface loopback, traffic management, and switch fabric packet generation and monitoring.

FPGA and structured-ASIC FICs
Proprietary multiple-FPGA and structured-ASIC technologies are available for developing scalable PCI Express and ASI bridges and endpoints. High-density, high-performance FPGAs with embedded PCI Express-compliant transceivers offer integrated solutions with scalable 2.5 links, dynamic phase alignment (DPA) for running interfaces at up to 1 Gbit/s per channel, and multiple package and density options up to 40,000 logic elements.

Alternate FPGAs, combined with a discrete PCI Express-compliant SERDES, such as the PMC-Sierra PM8358 QuadPHY 10GX device, provide low-cost, flexible solutions for 1X, 2X, and 4X applications where cost concerns outweigh the needs for performance and extensive features. High-density, high-performance FPGAs, combined with a discrete PCI Express-compliant SERDES device, can be migrated to proprietary structured-ASIC technology to offer solutions requiring the largest density, fastest performance, and highest volume applications.

About the authors
David Ridgeway is the senior strategic marketing manager for Altera's Wireline Communications Group. He holds a BS-EET degree from Cogswell Polytechnic College. Altera can be reached at www.altera.com.

Anthony Dalleggio is a co-founder of Modelware. He holds an MSEE from Polytechnic University. Modelware can be reached at www.modelware.com.

Copyright © 2003 CMP Media, LLC | Privacy Statement

Click here to read more ...