Minding network queues with FPGAs and memory

Minding network queues with FPGAs and memory
By Ali Burney, EE Times
February 5, 2004 (4:58 p.m. EST)
URL: http://www.eetimes.com/story/OEG20040205S0021

One of the major requirements in communication systems is quality-of-service (QoS). It is required in order to implement competitive service differentiation and is typically achieved through traffic management. Traditionally, communications systems have relied on ASICs to manage traffic, but with pressure to reduce development costs, shorten design cycles and implement more sophisticated systems to compete in a maturing marketplace, network equipment vendors are looking to field programmable gate arrays (FPGA) for traffic-management functions.

FPGA technology facilitates higher levels of system performance and flexibility. Just as the performance of digital signal processors can be boosted by using FPGAs as coprocessors, similar techniques can be used to complement network processing units (NPUs) and increase overall system performance.

Traffic-management systems can be divided into three main categories: scheduling, queue management and policin g. This article focuses on an FPGA-based queue manager implementation.

A queue manager can be efficiently implemented using an FPGA and external memory. The performance of the queue manager is limited in large part by how much memory bandwidth is available within the fabric of the FPGA and via external memory interfaces. Packet header information, payload data and traffic statistics all require storage, but tend to favor different sizes and types of memory structures. The programmability inherent in an FPGA eases the task of defining the number and depth of the queues for the queue manager and interfacing to external memory in which payload data is stored. Large blocks of memory in the FPGA fabric facilitate buffering payload data between network links and off-chip memory, as well as enabling large queue and memory map tables. The control logic and address decoding required to segment the external memory into a multiqueue and multiflow architecture are also implemented in the FPGA.

To implement the queue manager the FPGA's internal memory blocks must be address-mapped to the external memory. Addresses can be mapped dynamically by creating a linked-list structure in hardware. Or as an alternative memory can be statically allocated by dividing the external memory into fixed-size, submemory blocks. This discussion will detail a static address-mapping approach that avoids the overhead incurred in a linked-list structure and simplifies status signal handling.

Each queue will have a single entry in the memory block that uses status flags, head pointers (read) and tail pointers (write) to describe the queue. Status flags contain the empty, full, almost empty and almost full flags for each queue. Head pointers store the address locations for the next reads for the queue in the external memory. Tail pointers store the address locations for the next writes for the queue in the external memory. Depending on the depth of the queue required, the external memory can be segmented into submemory blocks, which are controlled by each entry in the memory block representing a single queue and FIFO buffer.

For example, an address width of 25 bits can easily be managed using a device such as Altera's Stratix II FPGA, which contains 512-kilobit memory blocks. These 512-kbit blocks can be configured in 72-bit-wide mode providing 8k words. The 72 bits include two 25-bit addresses and additional status flag bits. This configuration can manage up to 8k queues. Larger queue managers could be built by merging multiple memory blocks together. The depth of the queues is determined by the size of the external memory.

Read and write operations

When a packet arrives at the queue, the policer-scheduler determines the queue (0 to 63) in which to store the packet. If, for example, a write to Queue No. 3 was requested, the logic accesses the queue entry at the address location No. 3 in the FPGA memory block. After the first clock cycle, the tail pointer is masked out and sent to the external memory controller along w ith the frame to be stored. The tail pointer is incremented by one frame size and operations are performed on the head and tail pointers to update the status flags. The updated pointers and status flag bits are written back into the FPGA's internal memory on the next cycle. The same process occurs for a read. The updated status flag bits for queue No. 3 are also sent to the queue manager for processing. If the last write to the queue fills it, a read operation for that queue can be executed before servicing other queues.

The FPGA logic fabric is used to implement complex state machines that control all queue status signals, address pointers and external memory sequencing events.

During FPGA startup, the internal memory structures are configured to optimally manage the address space for each separate queue. Because the FPGA can be reprogrammed, the designer has greater flexibility in adapting and optimizing the queue memory resources.

External memory

The amount of external memory requir ed depends on the specific system application and is governed by the number of queues required, the depth of the queues and the throughput of the system. The choice of external memory is usually determined by system cost and performance requirements. SDRAMs are generally the best choice for this function because of the large amounts of memory required to buffer packets. On the other hand, the high latencies associated with SDRAM blocks generally lower the effective system throughput. Alternatives, such as RLDRAM II devices, can be used if higher system throughput is required. Altera's Stratix II FPGAs can support multiple wide high-performance RAM interfaces such as RLDRAM II up to 300 megahertz, DDR II SDRAM up to 266 MHz and QDR II SRAM up to 250 MHz.

The performance of the system is determined by the effective throughput of the external memory device and the queue manager. Overall system performance is determined by the external memory's read and write latencies since it is generally slower than inter nal memory performance. The queue manager throughput requirement is generally twice the line rate.

The large amount of internal memory available in advanced FPGAs, such as Altera's Stratix GX and Stratix II devices, make them an ideal candidate for performing queue-management functions for traffic managers. Using such devices, which include dedicated dynamic phase alignment circuitry, simplifies and reduces the cost of implementing the SPI-4.2 protocols often used to handle such traffic-manager functions while improving traffic-manager system performance. Stratix II devices, for example, can be used to implement line card functionality and deliver 50 percent faster performance while reducing the logic cost of the implementation by half. Clearly, using a combination of NPUs and FPGAs in networking applications can deliver the same kind of benefits wireless and communications system providers have been experiencing by using a combination of digital signal processors and FPGAs to boost overall system perfo rmance while reducing system cost.

Ali Burney is a product-planning engineer; co-authors Michael Rather is an advanced product-planning engineer and Robert Blake is vice president of product planning; all are with Altera Corp. (San Jose, Calif.).

See related chart