90-nm FPGAs handle 10-Gbit traffic management tasks
By Kevin Cackovic, EE Times
March 1, 2004 (11:00 a.m. EST)
URL: http://www.eetimes.com/story/OEG20040301S0018
Today's service providers are experiencing a growing challenge, as multinetwork inefficiencies demand that voice, video and data be carried across the same network. Additionally, implementing the quality-of-service required for these converging networks at 10 Gbits/second remains technically challenging. Through developments on the process front, FPGAs can offer a cost-effective, low-risk means of implementing the critical traffic manager functions required to support 10-Gbit/s data transfer. Additionally, these devices provide the flexibility to easily expand functionality to support emerging services. Let's see how. In a typical line-card implementation, the packet processor classifies the ingress data traffic (data traveling from the line side toward the switch) and determines which port the data should exit. The data header is also modified by the packet processor, which adds the appropriate class. The traffic manager uses this header to enforce the service-level agreement that defines the criteria that must be met for a specified customer or service. With egress traffic (data traveling from the switch to the line side), the traffic manager smooths large spikes in traffic, allowing the overall network to run more efficiently. A traffic manager in the data path is considered to be in flow-through mode. This mode has the advantage of reducing the complexity of the packet processor by offloading the packet buffering. A traffic manager outside the data path is considered to be in "lookaside" mode. In this mode, the packet processor communicates to the traffic manager through a lookaside interface and receives scheduling information, but the packet processor is also the interface to the backplane transceiver. In this mode, the packet processor buffers the packets and the traffic manager is responsible for maintaining the descriptor tables. Processing data at 10 Gbits/s has been a challenge for FPGAs, but becomes manageable with the introduction of 90-nanometer FPGA architectures. The implementation of 10-Gbit/s traffic management solutions in FPGAs will be explored by looking at three areas: interfaces, fabric and memory. The interface between the packet processor and traffic manager in flow-through mode is typically SPI-4.2, but may also be CSIX, or a proprietary interconnect. In the past, interfaces supporting 10-Gbit/s throughput limited the amount of logic that could be added to the FPGA, because of the high percentage of logic the interfaces consumed. For example, a duplex SPI-4.2 interface consumes more than 30 percent of logic in a large FPGA architecture at 130 nm. From a pure logic density standpoint, a 90-nm FPGA device could increase the logic density over previous architectures by almost 125 percent. In addition, the improved performance associated with process advancements allows the data path to be reduced much more easily. These improvements allow much more logic within the FPGA to be available for implementing the necessary traffic manager functions. In lookaside mode, the interface consumes much fewer resources, since the lookaside interface, LA-1, will only transmit the necessary packet overhead and not the full packet. In this case, the interface logic becomes a relatively insignificant portion of the available logic in the FPGA. An urgency counter implementation demonstrates the effect of FPGA architectural improvements on traffic management functions. Urgency counters can be used for describing the weights of queues to be scheduled. The scheduler is required to select the maximum urgency counter and to update the urgency counters for each queue upon selection. The queue that is selected has its urgency counter decremented by the sum of all active urgency counters. The other active queues have their urgency counters incremented by the assigned weighting of the queue. There are several implementation possibilities for sorting these urgency counters, including heap, row and array. With any implementation, the selection of the maximum urgency counter requires a large number of comparisons. The array architecture compares each bit slice of the urgency counters in parallel, in order to obtain the maximum bit for that bit slice. When a bit of an urgency counter "i" is less than the same bit of urgency counter "j," the comparators of urgency counter "i "become disabled and are eliminated from comparison. These comparators are essentially three-input functions. Designers can configure the adaptive-logic module in an FPGA, for example, to implement two lookup tables with the same or different number of inputs. When implementing a function of three or less variables in a traditional four-input LUT structure, the unused portion of the LUT is wasted. The unused portion of the LUT can then be reused, allowing the implementation of a separate function that has up to five independent inputs. This provides greater efficiency by allowing the combination of comparator functions with other functions within the same LUT. The algorithm described also requires computation of the sum of all the individual weights. This can be done by using a pipelined arithmetic addition scheme that uses the weights and the queue activity status to calculate the sum of the weights. Queue manager A queue manager buffers the incoming data traffic from the packet processor and creates tables of pointers to the buffered data. These buffers typically are located off-chip in external memory, but with embedded memories, portions of the queue manager buffers can be kept on-chip. The use of internal SRAM reduces pins, power, board space, cost and latency. Pointer memory, or queue memory, can be absorbed into internal memory to greatly increase the system latency. These pointers are used to keep track of the various flows being serviced. In a potential implementation, the internal SRAM stores both the head and tail pointers for each flow. An external SDRAM contains the actual data, plus pointers to the next piece of data within the flow. This creates an overall linked list for each flow. When a new piece of data is added to a flow, the tail pointer is updated for that flow. When a piece of data is scheduled to be sent to the switch fabric, the head pointer for that flow is updated. There's no doubt that today's FPGAs have emerged to a point where they are an effective solution for handling 10-Gbit traffic management tasks. The advanced architecture of FPGAs, coupled with the advantages of 90-nm process technology migration, make it possible for such devices to service high-end traffic manager requirements. The FPGA's enhanced fabric is optimized for the computationally intensive functions of traffic management. Additionally, the support of flexible high-speed memory allows memory management at today's highest rates, with support for future memory standards. Finally, the embedded memory structure of these devices enables storage of pointer tables into the large embedded memory blocks and statistic caches, resulting in a complete solution for implementing high-speed traffic management solutions. Kevin Cackovic is a strategic-marketing manager at Altera Corp. (San Jose, Calif. ).