RapidIO fabric is validated at system level

RapidIO fabric is validated at system level
By Ian Dunn and John Moore, EE Times
November 7, 2003 (2:51 p.m. EST)
URL: http://www.eetimes.com/story/OEG20031107S0037

Results from one of the first tests of the RapidIO switch fabric show that in an initial implementation the interconnect has achieved 97 percent of its theoretical performance after accounting for 4 percent protocol overhead. Such results indicate the technology's maturity for use in signal and control-plane applications.

Our test system connected PowerPC microprocessors over a backplane using an 8-bit parallel implementation of the RapidIO architecture. The RapidIO switch fabric is an interconnect linking chips on a circuit board and circuit boards across a backplane in a manner designed for low latency as well as bandwidth, and reliability as well as scalability. RapidIO interfaces are small and efficient enough to have one or more interfaces on digital signal processors, field-programmable gate arrays, network processors and control processors. By having a single fabric that directly connects all of the chips in the system and reaches th rough the entire system, the RapidIO interconnect simplifies system design while improving system performance.

The RapidIO specification is partitioned into three layers: logical, transport and physical. The physical layer currently defines two physical interfaces, parallel and serial, which are both full-duplex. The serial interface runs at 2.5 or 3.125 Gbits/second, and four links can be grouped to form a 4x link with a data rate of 10 Mbits/s in each direction. The parallel interface is either 8 bits or 16 bits wide and can run at 250 MHz to 1 GHz with data clocked on both edges. This study focuses on the 8-bit parallel interface using memory-mapped I/O.

The test system is the commercially available ImpactRT 3100 multicomputer, which runs 8-bit parallel RapidIO connections among processors on a board and between boards though a CompactPCI backplane. All RapidIO links in the test apparatus run with a 311-MHz clock, yielding a theoretical peak data rate of 1.2 Mbits/s (622 Mbits/s per interf ace) full duplex.

Each board contains four processor nodes with RapidIO ports. Each processor end point includes a 1-GHz PowerPC 7455 with a 133-MHz front-side bus and double-data-rate SDRAM. A Mercury RapidIO end point ASIC acts as a combination memory controller and network interface to connect the processor and memory to each other and to the RapidIO fabric. The end points on a board are connected to a RapidIO crossbar switch, which also has two RapidIO connections to the backplane through the CompactPCI P4 connector.

The test system can contain as many as 20 such RapidIO boards. Connections between boards are made by switches on an active backplane overlaid with a dual-star topology. Passing data between processors on different boards involves routing data through as many as four switches: one on each board and as many as two in the backplane if the interprocessor communication spans more than five slots.

Mercury gathered RapidIO performance results on the communication between p airs of boards in the test system. Two types of noncoherent remote memory access transactions were used to characterize the performance of the RapidIO interconnect as a function of packet size: NWRITE_R and NWRITE. Both transactions are used by a source RapidIO end point to write to a specified location in a target end point's address space. In contrast to NWRITE, NWRITE_R forces the target to generate a response upon completion of the write transaction in the target's memory.

The unidirectional results asymptotically approach the theoretical performance of 594 Mbits/s, which is derived from the link transfer capacity of 622 Mbits/s for 268-byte packets with 12 bytes of header and 256 bytes of data payload. For example, a 64-kbyte NWRITE transaction achieves 577 Mbits/s, or 97 percent of theoretical performance.

However, the bidirectional rates are not governed by the underlying RapidIO aggregate link capacity of 1,244 Mbits/s; the memory subsystem and endpoint state machine limit the bidirec tional capacity. At 133 MHz, the PowerPC local bus is limited to just over 1 Gbit/s. The memory subsystem and the RapidIO interface must share this bandwidth, and, as a consequence, doubling the unidirectional rate of 590 Mbits/s per direction is not possible. The actual rate achieved for NWRITE is close to 900 Mbits/s and is limited by some PowerPC bus traffic associated with the application programming interface used for control and RapidIO interface management.

The bidirectional NWRITE_R results achieve 735 Mbits/s. The difference between the two bidirectional results can be attributed to a limitation in the input state machine whereby responses and inbound writes both pass through the same input queue. The overhead associated with servicing the responses limits the inbound capacity for data writes and accounts for the 165-Mbit/s difference.

An update to the state machine is planned to address this limitation, but the more general limitation of the processor local bus can be addressed only with higher-capacity interfaces or by embedding the RapidIO endpoint into the processor. Limitations in the internal architecture of the device may nonetheless harm performance.

The throughput results can also be used to determine key latency parameters of a RapidIO interface, such as overhead cost associated with launching unidirectional NWRITE transactions. If throughput is calculated as transfer size divided by latency, and latency is modeled as fixed overhead plus an incremental rate multiplied by the transfer size, then a least-squares fit of the data reveals an overhead cost of 1.8 microseconds per transaction and a link speed of 590 Mbits/s for the data payload.

The RapidIO switch fabric was designed to meet the performance and transport demands of a broad range of embedded-computing applications. The test results for latency and sustained throughput show that even its initial implementation can fill the needs of many such applications.

Performance is only one factor in the su ccess of a technology. Motorola's support and incorporation of RapidIO into its communications processors has solidified initial deployment of RapidIO technology in the control plane as a replacement to the processor local bus. The appropriate support among DSP vendors is also in place to bring RapidIO technology successfully into the signal plane as well.

Ian Dunn is technical director, OEM Communications Computing Segment, and John Moore is senior systems applications engineer at Mercury Computer Systems Inc. (Chelmsford, Mass.).

See related chart