|
||||||
How to analyze processor features for network use
How to analyze processor features for network use EEMBC benchmarks may help you analyze and decide which processor to use in your network application. This article explains various benchmarks and how you can interpret their results. It also shows an example. Selecting the best processor for the job requires that you balance many factors such as the chip's price, power, availability, and performance. Although performance is sometimes equated with speed, the demands put on processing subsystems differ widely depending on the task. In networking applications, processors need to move data packets and translate network addresses, among other things, and no device category is perfect when it comes to handling all of these tasks. Hardwired logic is often too inflexible to deal with many of the error conditions and interactions that can occur with Internet protocol (IP) packets. On the other hand, flexible software-programmable processors may be too slow to support next-generation routers and switches. To give designers a tool to compare the performance of these processing subsystems as they carry data packets across a network, the Embedded Microprocessor Benchmarking Consortium (EEMBC) developed the Networking Benchmark Suite Version 2.0 that is structured to ensure the algorithms result in the maximum stress being applied to candidate processors, exposing the strengths and weaknesses of individual processors on different types of code. This article discusses the components of that suite and presents results of tests run on two high-end, PowerPC-based processors, the Freescale MPC7447A and the IBM 750GX. No single benchmark is adequate to address processor performance in every application, and even within the networking application, the tasks faced by processors are quite different depending on where the devices are used. For example, routers typically don't need to process information in the Transmission Control Protocol (TCP) layer, and results for tests associated with TCP performance will skew the results for those OEMs looking to select the best processor for work at the lower IP layer. Some of EEMBC's networking benchmarks are thus designed to reflect the performance of client and server systems, while others represent functions predominantly carried out in infrastructure equipment. Two new consolidated score types, the TCPmark and the IPmark, aggregate the results for the benchmarks in each group. The IPmark is intended for developers of infrastructure equipment, while the TCPmark, which includes the TCP benchmark, focuses on client- and server-based network hardware. Quick look at the processors However, this is where the similarities end. Freescale's MPC7447A contains a 1.4-GHz superscalar core with a seven-stage pipeline capable of issuing four instructions per clock cycle (three instructions plus one branch) into 11 independent execution units. IBM's 750GX contains a 1-GHz core and a four-stage pipeline. The shorter pipeline gives this chip an advantage when running branch-intensive applications and benchmarks due to its shorter load latency and smaller branch-mispredict penalty. On the other hand, the MPC7447A has a different advantage with its integrated AltiVec engine, useful for vectorizable algorithms. The MPC7447A has a 512K, 8-way set associative second-level (L2) cache and the 750GX has a 1M, 4-way L2 cache. The IBM chip's increased set associativity (8-way versus 4-way) is an advantage in multitasking systems, often delivering performance comparable to a cache that's twice as large. Furthermore, the 750GX's memory bus runs up to 200MHz, compared with the Freescale device with its 133-MHz bus. These differences will be significant factors in analyzing the benchmark results. Routing IP packets Tested against the Packet Check benchmarks, the "out-of-the-box" scores for the Freescale and IBM processors are within a few percent of each other, as shown in Figure 1. Despite its 1-GHz operating frequency, the 750GX does well on these benchmarks due to its large L2 cache and 200-MHz system bus. For the packet-forwarding function, the router must determine which other routers are available for forwarding and find the shortest path to each. Open Shortest Path First (OSPF) is the most popular Internet routing protocol used to determine the correct route for packets within IP networks. Using Edsger Dijkstra's shortest-path-first algorithm, EEMBC's OSPF benchmark performs a series of calculations to determine the destination port for each given route. More than half of the instructions executed in the OSPF benchmark are some type of compare or branch instruction. Intuitively this should favor the 750GX with its shorter pipeline, but it's the MPC7447A that gets the higher OSPF score because of its higher frequency, greater number of functional units, and caches that are large enough to hold all the code and data for the OSPF benchmark. Once the route tables are built using protocols such as OSPF, efficient route lookups are fundamental to the performance of network routers. Based on information found in lookup tables, the Route Lookup benchmark receives and forwards IP packets using a mechanism commonly applied to commercial network routers. It employs a data structure known as the Patricia Tree, a compact binary tree that allows fast and efficient searches with long or unbounded-length strings. The benchmark monitors the processor's ability to check the tree for the presence of a valid route and walk through the tree to find the destination node to which to forward the packet. The code and data for this benchmark fit into the L1 caches, favoring the MPC7447A because of its higher frequency and because it doesn't need to use its external memory bus. Although this benchmark consists mainly of compare and branch instructions, the 750GX's shorter pipeline doesn't appear to provide significant benefit. Figure 2 shows that the "out-of-the-box" Networking Version 2.0 benchmark scores for the MPC7447A and 750GX demonstrate that operating frequency alone does not account for processor performance. A larger L2 cache, faster memory bus, and shorter pipeline contribute to the 750GX's high performance. Only on the computation-intensive OSPF benchmark does the advanced superscalar MPC7447A take the advantage. (The Y-axis is iterations/sec). Processing packets at the network boundary Dynamic NAT routing processes all outgoing packets and has the additional complexity of port assignment on the incoming and outgoing packets to preserve connections between clients and servers. EEMBC's NAT benchmark uses packets with various source addresses, destination addresses, and random packet sizes. Each packet is wrapped with IP header information and the packets are assembled into a list for processing. The benchmark begins processing and rewriting the IP addresses and port numbers of packets based on the predefined NAT rules. Each rewritten packet will have a modified source IP address and source port chosen from the available ports of each IP address available to the router. As Internet traffic passes from one part of the network to another, the packets themselves may need to be altered. Each network technology has a maximum frame size that defines the maximum transfer unit (MTU), or maximum packet size, that can be carried over the network. When an IP packet is too large to fit within the MTU of the egress (outgoing) interface, it can no longer be transmitted as a single frame. Rather, the IP packet must be split up and transmitted in multiple frames. EEMBC's IP Reassembly benchmark takes the asymmetric nature of fragmentation and reassembly into account and makes extensive use of out-of-order delivery and random source-packet sizes to stress the processor's ability to perform reassembly. Figure 2 shows the results of NAT and IP Reassembly benchmark tests on the MPC7447A and 750GX. About 80% of the IP Reassembly benchmark's instructions are equally divided between load/store and compare/branch. The instruction mix for the NAT benchmark is similar to that of the IP Reassembly benchmark, with the addition of a few multiply and divide instructions. The combination of the shorter pipeline and the 1M L2 cache significantly favors the 750GX on this benchmark. Another point not mentioned yet is that the system configuration used for generating these benchmark scores combined the MPC7447A with the older Tundra tsi107 system controller (133-MHz bus). However, the newer Discovery III system controller (not available at the time of this benchmark certification) supports a 167-MHz bus. Improving quality of service EEMBC's QoS benchmark simulates the processing undertaken by bandwidth management software used to "shape" traffic flows to meet QoS requirements. Based on predefined rules, the system paces the delivery of the system to the desired speed. The benchmark begins by processing packets according to the rule set, which determines the routing and addressing needed to preserve the QoS for each packet stream. As the number of packets in the system increases, port diversions occur to maintain the QoS, and queues are established to wait for available pipes. The benchmark is very compute-intensive, with about 45% of its instructions doing loads and stores and about 40% doing compares and branches (demonstrating the value of the 750GX's architecture). TCP for clients and servers EEMBC's Networking Version 2.0 benchmarks include a TCP benchmark that accounts for the different behavior of TCP-based protocols by measuring the performance of a processor that handles a workload derived from several application models. The TCP benchmark has three components to reflect performance in three different network scenarios. The first is a "Gigabit Ethernet" kernel involving large packet transfers to represent the likely workload of Internet backbone equipment. The second kernel assumes a standard Ethernet network for packet delivery and concentrates on large transfers using protocols such as FTP. The last kernel uses a standard Ethernet network model for the relay of mixed traffic types, including Telnet, FTP, and HTTP. The main part of the benchmark involves processing all of the packet queues through a server task, network channel, and client task. These simulate the data transfers through the connection to provide a realistic view of how the processor will cope with various forms of TCP-based traffic. A large portion of the instructions in the TCP benchmarks is data loads and stores. Different upper-level protocols stress TCP-handling hardware in different ways. For example, Telnet consists of short, small bursts of data in small packets that result from a user typing commands and receiving results. On the other hand, FTP consists of large amounts of data in large packets moving in one direction. HTTP is somewhere in the middle with bursts of files in one direction intermixed with control and handshaking traffic in both directions. This makes the consideration of traffic type essential when analyzing the performance of a processor that will process TCP-layer traffic. Figure 3 shows the three sets of "out-of-the-box" scores for EEMBC's TCP benchmark using jumbo, bulk, and mixed data sets representing Gigabit Ethernet, standard Ethernet, and a mixture of traffic types (Telnet, FTP, and HTTP), respectively. The MPC7447A performs well on this benchmark due to its integrated AltiVec unit. (The Y-axis is iterations/sec). The TCP scores on the MPC7447A are 50% to 65% better than those of the 750GX due to the former's AltiVec unit applied to the key time-consuming functions in the TCP protocol. Specifically, the memory-copy function (memcpy) was accelerated in these benchmarks solely by linking in the libmotovec AltiVec libraries available at www.freescale.com/AltiVec. The checksum and memcpy_and_checksum functions could also be accelerated with AltiVec, but not without changing the benchmark source code or the function calls in the libmotovec library—two optimizations that are not allowed under EEMBC rules. More than clocks Markus Levy is founder and president of EEMBC. He's worked for EDN Magazine and Instat/MDR in the past and is coauthor of Designing with Flash Memory. He also worked for Intel as a senior applications engineer and customer training specialist for Intel's microprocessor and flash memory products. You can reach him at markus@eembc.org. Copyright 2005 © CMP Media LLC |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |