Digging into Advanced Switching Spec
Digging into Advanced Switching Spec
By Kiran Puranik , Xilinx, CommsDesign.com
April 10, 2003 (9:40 a.m. EST)
URL: http://www.eetimes.com/story/OEG20030410S0020
Momentum around the PCI Express specification is building in the communication sector. From broadband to wireless to networking applications, designers across the board are looking to implement these interconnects in box designs. The advanced switching (AS) specification is one of the big reasons that PCI Express is garnering so much attention in the space. By supporting host/slave configurations as well as peer-to-peer connectivity, the advanced switching enhancement has made PCI Express a more viable option for the communication sector. But, before designers can implement AS in a system design, they have to understand the features and benefits this technology provides. Below you'll find a guide that will help. Spec Overview AS components are broadly classified into two categories: switches and end systems. Unlike legacy PCI platforms that mandate a strong parent-child relationship between connected components, AS architecture allows flexible interconnect topologies, allowing switches and end systems to freely work together. An AS platform can be thought of as a graph of connected switches and end systems. Switches constitute internal nodes, providing, interconnect with other switches and end systems. End systems on the other hand, are graph's edge nodes that represent data ingress and egress points. This ability to support multifarious topologies gives platform architects enormous flexibility, regarding placement of critical resources relative to each other. The Switching Architecture The specification does not mandate a particular switch interconnect topology nor, does it assume a particular switch implementation. The architecture emphasizes component interoperability considerations, in order to enable a multi-vendor ecosystem. Figure 1 shows the AS protocol stack at an end system. As this figure shows, the AS transaction layer is the interface between upper-layer protocols (ULPs) and data link layer, and serves as a tunnel for ULP encapsulation and extraction at end systems.
AS leverages the PCI Express base, physical, and data link layers. Communication and embedded computing enhancements are added at the transaction layer, to address, chip-to-chip, mezzanine, backplane and inte r-chassis data communication requirements.
An AS fabric is a non-blocking, multi-stage, lossless, switching architecture capable of handling traffic loads found in a wide variety of applications. Platform sizes can range f rom one to a several thousand switches. The maximum number of switch hops on a path between two communicating end systems can range from two to 15.
Protocol interfaces are carriers of encapsulated protocols that interact directly with ULPs. Fabric interfaces handle system discovery, configuration, events signaling, congestion management and segmentation/reassembly.
The AS spec defines a standard set of protocol interfaces (PIs) [Table 1]. PIs 0 to 7 are reserved for various fabric management tasks, while PIs 8 to 254 are used to tunnel applications.
Table 1: AS Protocol Encapsulation Interfaces
Switches are required to route all application PIs (PI0 to 6) without any change. Switches must however support reception and generation of fabric management PIs (PIs less than 8).
An AS transaction layer packet (TLP) is shown in Figure 3. This packet contains a r oute header, followed by a payload section. Route header PI field, together with the turn pool and turn pointer fields, determine packet processing flow within an AS component. Fabric ingress end systems provide route header and payload data. Switches query the PI, turn pool, and turn pointer fields to determine the egress port of the switch. Switches may be the destination of certain fabric management PIs. Egress end systems read PI and turn pointer values to filter misdirected packets.
Figure 4 shows an entire AS transaction layer frame. The physical and link layer portions of the frame are added and removed at each hop between fabric components.
Routing Approaches
The AS specification calls out three approaches for routing packets in a network. These include unicast, multicast, and broadcast. Let's look at all three in more detail.
1. Unicast Routing
Unicasting is the most commonly used routing technique in AS fabrics. A unicast packet has a single destination, and packet is routed based on turn pool and turn pointer fields in route header.
The 32-bit turn pool field represents series of turns the packet is required to take, at switches, as it traverses the fabric. The most significant bit of the turn pool is the reverse bit. For all forward-routed packets, this bit must be 0.
The turn pointer field is a 5-bit value that always designates the current turn value in the turn pool. The initial turn pointer points to the most significant valid turn pool bit. As a packet enters a switch ingress port, the turn value slice is extracted from the turn pool. The turn pointer value is updated in the packet and is readie d for the next switch hop. The number of logical ports on the switch determine the size of this slice, and is governed by the following relationship:
Number of Switch Ports = (2TURN VALUE WIDTH+ 1)
For example, a turn value width is 3 bits for a switch with 9 logical ports:
Turn Value = Turn Pool [(Turn Pointer):(Turn Pointer - Turn Value Width)]
Turn Pointer = Turn Pointer - Switch Size
The egress port for the packet is:
Egress Port Number = (Ingress Port Number + (Turn Value + 1)) modulo (Number of Switch Ports)
The switch now forwards the packet to this egress port. Note: the turn pointer must be 0 as the packet exits last switch on packet path.
A packet's forward route turn pool also serves, as backward route to the packet's source end system, from any point on the forward path. This is accomplished by using the turn pool from the forward routed packet, and setting the reverse bit. For example, a completion packet generated in response to read request is bac kward routed to the request source end system, as are, event notification messages regarding packets in transit, from any point on the packet's forward route.
For a backward routed packet, on the ingress switch port:
Turn Value = Turn Pool [(Turn Pointer + Turn Value Width):(Turn Pointer)]
Turn Pointer = Turn Pointer + Switch Size
The egress port for the packet is:
Egress Port Number = (Ingress Port Number - (Turn Value - 1)) modulo (Number of Switch Ports)
2. Multicast Routing
The multicasting feature allows an end system to target a packet to multiple end systems. A multicast group index is carried on each multicast packet's route header. A multicast group uniquely identifies a set of switch egress ports for each switch hop on a multicast packet's path. A multicast group table in a switch is looked up, using packet's multicast group index. The packet is then replicated on each port contained in the multicast group. In the process of traversal through the fabric, each mult icast packet, constructs a turn pool from the source, by recording turns within the switch. This provides a backward route to the multicast source end system for event notifications regarding this multicast packet.
3. Broadcast Routing
Switches respond to broadcast route header on a packet, by replicating it on all ports except the packet's ingress port. Broadcast packets are generated during system discovery.
Multi-Protocol Support
As Figure 5 points out, AS supports tunneling of virtually any protocol. Protocol PI implementations mandate efficient encapsulation, with no loss of tunneled protocol's semantics, enabling effective extraction at fabric's egress. A PI may be, custom tailored for proprietary protocols or can tunnel standard protocols, supporting a broader range of applications.
PIs for industry standard protocols like PCI, PCI Express Base, SPI, ATM, TDM, Ethernet, and more will be defined as companion specifications, enabling interoperable solutions. This makes AS platforms, modular, cost effective, easy to deploy and support.
Platform Features Supported
The AS spec introduces several features useful for the communications space. Here are seven of the more important ones for comms designers.
1. End-to-End Data Integrity Support
The AS TLP offer end-to-end data integrity features that augment data protection (CRC-32) provided by the AS data link layer. This detects any packet corruption within fabric components.
The route header (8 bytes) in the AS TLP is protected by a 7-bit CRC, as shown in Figure 3. CRC is computed over only invariant portions of the route header, by the packet source end system. AS components must check CRC, and report any failures, after discarding the TLP.
AS protocol PIs may include end-to-end CRC, covering the tunneled protocol data. Such PIs may check this CRC, and ask for retransmission of TLPs with bad CRC. CRC polynomials used in these PIs can be tuned to expected payload sizes and level of data integrity desired by the target applications.
2. Distributed Computing Support
AS enables distributed processing systems, resulting in multiple memory address domains, within a platform. This is in sharp contrast to the single flat address domain found in legacy PCI platforms. AS platforms allow load-store and messaging protocol interaction between concurrent hardware and software processes on end systems. This facilitates peer-to-peer-based applications commonly found in communications designs. Distributed processing offers better scalability and ultimately lower cost of operation.
3. Congestion and QoS
Bandwidth provisioning is an important aspect of AS fabric management, to guarantee quality-of-service (QoS) levels to applicat ions. End systems must operate within prescribed bandwidth limits by metering the rate of data transfer. Congestion in AS fabric can be caused by unexpected transient events, like component failure, accidental removal, or errant end system behavior. Congestion causes packets to suffer excessive latencies, resulting in loss of expected levels of service.
AS congestion management mitigates fabric congestion and maintains QoS levels. Congestion is detected at switches and then backward explicit congestion notification (BECN) messages are sent upstream to end systems contributing to congestion. End systems respond to BECN messages, reducing the rate of data injection, by a specified amount.
Congested packets may also be marked with a forward explicit congestion notification (FECN) bit. FECN notifies downstream end system of congestion on a certain path.
Packet source end system may set the discard bit in the AS route header. This enables downstream switches to alleviate local congestion, by discar ding these packets. In absence of BECN messages, end systems restore normal traffic flow in specified additive increments. Fabric management may use congestion notifications to initiate corrective action and activate built-in fail-over (use alternate paths).
4. Differentiated CoS
Class of Service (CoS) mechanisms reduce the complexity of QoS by mapping multiple traffic flows into a few service levels. AS fabric resources are allocated based on up to eight service levels, called traffic classes (TCs), and traffic flows are aggregated and forwarded by fabric components based on to the TC of these packets.
Within a TC, the AS fabric preserves ordering of packets, end-to-end, with the exception of those marked as bypassable at the source. No ordering relationship is required to be maintained between TCs.
AS components map TCs to virtual channels (VCs) that correspond to hardware channels within the components. AS components must implement at least two VCs. All AS link partners must shar e the same TC to VC mapping and downshifting, if required, to the smallest common number of VCs supported between the two link partners.
Each AS VC contains two independent queues: a main and a bypass queue. AS link flow control manages flow credits for each queue independently. A packet marked bypassable must enter bypass queue, if it is can cause VC head of line (HOL) blocking. The bypass queue must be serviced as soon as bypass queue credit becomes available. This capability is compatible with legacy bus protocols that require write transactions to pass HOL blocked read transactions, to avoid possible deadlocks.
The AS VC arbitration mechanism allows a component to provide CoS by controlling scheduling priority between VCs. Scheduling mechanisms defined include strict priority, round robin, or weighted round robin. VC arbitration characteristics of a component port can be programmed remotely via the port configuration space.
The switch component ports also implement a port arbitration mechani sm, which governs scheduling priority between packets of the same TC, coming into the switch from different ingress ports.
5. SAR Support
Maximum payload size (MPS) of an AS platform is the smallest MPS supported amongst all components in the platform. All PIs must restrict AS TLP payload size to platform MPS. End systems that need to encapsulate larger than MPS protocol packet sizes must split these into sub MPS-sized segments. This also means, keeping track of multiple segments of the original packet and reassembly at the fabric egress end. AS supports a standard mechanism to perform segmentation and reassembly (SAR) with a fabric PI 3. Individual protocol PIs may choose other SAR implementations.
6. Fabric Discovery, Management
AS fabric discovery is the process of scanning the interconnected components to determine topology and component characteristics. All AS fabrics require at least one fabric management-capable end system.
AS fabric manager (FM) end systems initiate and control fabric discovery process. FMs start the discovery process, by injecting a blind broadcast packet (PI 0) into the fabric. The blind broadcast packet records the forward turn pool from FM to all components it detects in the fabric. When a blind broadcast packet is received at an AS component, a unique identifier and turn pool, corresponding to the source FM, is stored in the component configuration space.
All components must be capable of supporting a primary and a secondary FM. End systems use the saved turn pool to send operational event notifications (PI 5) to FM, as backward routed unicast packets.
Blind broadcast results in formation of fabric spanning trees with FMs at their roots and subordinate end systems at the leaf level. These spanning trees can be used to generate a connectivity graph of the fabric, that can yield turn pool for paths between various end systems.
FMs usually run fabric management software. The FM controls fabric component behavior by accessing component c onfiguration space using PI 4 read and write packets. The FM sets up connections between communicating end systems and reacts to event notifications.
7. HA Features
AS fabrics support several high availability features. In case of a catastrophic failure in the switch fabric, an event notification (PI 5) is generated to primary or secondary FM associated with the failed component. The notification pinpoints the problem area to the FM and failover procedures are initiated to bypass failed paths. Hot attach and detach of components generates event notifications to the FM. The FM can then query newly attached components and configure them. FMs may use heartbeat event notifications to monitor each other.
Wrap Up
The AS spec addresses all the major architectural requirements for next-generation system-fabric solutions, such as low cost, scalability, expandability, modularity, high availability, peer-to-peer capability with built-in QoS and CoS support. Through these features, develope rs can design almost any type of communication system, from high-end routers to next generation storage servers and firewalls, based on AS fabrics.
The AS specification currently is in final development phase within the Arapahoe Work Group (AWG) and will be released for general distribution by the end of the second quarter of 2003. Once available, chip and software vendors will begin rolling out solutions, making AS a viable option sooner than later in the comms space.
About the Author
Kiran Puranik is a staff design engineer at Xilinx Inc. San Jose CA. He has over 12 years of experience in semiconductor and software industries, and may be reached at target="_new">kiran.puranik@xilinx.com.