Adapting Ethernet Controllers to Meet Embedded Networking Needs
Charlie Forni and Paul Brant, SMSC
Feb 03, 2005 (5:16 AM)
Today, embedded systems, supporting consumer electronics and other markets, require a higher level of network connectivity. Non-PCI Ethernet controllers offer the system designer the best mix of performance and architectural options for embedded connectivity solutions. As a result, non-PCI Ethernet controllers have become the standard for peripheral networking solutions in these embedded designs.
Traditionally, designers have been faced with multiple challenges when adding Ethernet connectivity to non-PCI embedded systems. These common challenges include adapting non-standard bus interfaces to available Ethernet controllers, meeting the CPU bandwidth requirements of TCP/IP packet processing, and minimizing the degradation of a system's performance inherent in combined peripheral and memory interfaces.
To better overcome these challenges, one must look at making changes to traditional Ethernet controllers. The goal is to create an embedded system solution that is capable of improved performance while using existing embedded CPUs. Let's see how this goal can be achieved.
Controller Design Basics
Figure 1 below describes a typical embedded system incorporating a non-PCI embedded CPU and its associated memory (SRAM, SDRAM and Flash) and peripherals (video, USB and IDE controllers) connected via a local bus. Examples of embedded CPUs and operating systems include the Intel XScale, Renesas (SHx), Panasonic, and STMicro devices running Linux, Windows CE, VxWorks, and other RTOSes.
From an OS or software perspective, understanding how data is moved from the embedded system application to the Ethernet network is important. Figure 2 below shows an implementation of an embedded system's software structure. The software structure is divided into the application, the OS, including the TCP/IP stack, and the Ethernet controller device driver. The non-PCI Ethernet controller interfaces to its software driver via the local bus.
When the application sends data or control information over the network, in conjunction with the OS, it creates a software data-structure-linked list pointing to various buffers, which avoids data copying. In addition, the buffers hold the data that will be put into an Ethernet packet. Each buffer represents a different portion of an Ethernet packet that will be assembled by the device driver for transmission. The data buffers are non-contiguous, meaning the beginning and end of each buffer is not in the same linear address space.
Figure 1: Typical embedded system with non-PCI Ethernet connectivity.
As further demonstrated in Figure 2, the OS passes the "Tx Data Ptr" variable, or software pointer, to the Ethernet controller software driver. The "Tx Data Ptr" has the address of descriptor 1, which points to descriptor 2, etc., which in turn points to the data buffers. The Ethernet controller driver will then perform each data buffer move over the local bus to the Ethernet controller.
Figure 2: Diagram showing how application data is sent out over an Ethernet network.
The receive operation is performed in reverse order. This process is very software intensive and can reduce performance if not handled properly. What is also important is that the various data buffers may not be natively aligned in memory, which is critical to overall network system performance.
Limited Options
Currently, consumer electronics devices that employ Ethernet connectivity are limited to three options:
- External PCI Ethernet Controller — Most embedded processors do not support a PCI bus, and as a result, selecting a PCI Ethernet controller is typically not an option. Table 1 below indicates, based on market research, that none of the 8-bit or 16-bit CPUs, and only 16 percent of all 32-bit CPUs support PCI, while most embedded processors support a local/memory bus. Embedded CPU manufacturers have not embraced PCI for several reasons. One reason is cost. The additional I/O pins and circuitry required to implement PCI will inflate the price of the embedded processor. Embedded CPUs require a memory bus and in many cases, this bus is shared with peripherals. Therefore, adding another wide parallel bus to support PCI peripheral devices is not practical. In addition, advanced PCI features, such as plug-and-play, are usually not required in these embedded systems.
- Integrated Ethernet — Table 1 below shows the percentage of embedded CPUs that incorporate an internal Ethernet controller. It is clear that the majority of all embedded processors do not support an integrated Ethernet controller.
- External Non-PCI Ethernet Controllers — In reference to Table 1, the majority of all embedded processors support a local bus without PCI, and do not include support for an internal Ethernet controller.
Table 1: Bus Types and Integrated Ethernet Controller Support
Design-in Challenges
The challenges of adding high-performance Ethernet connectivity to non-PCI embedded CPUs are often ignored by traditional non-PCI Ethernet controllers. However, by considering certain architectural enhancements, non-PCI Ethernet controllers can deliver better performance while also addressing the issues of cost and reliability. The three major challenges are buffer alignment, bus architecture, and flow control. Let's look at each in more detail.
1. Buffer Alignment
The issue of buffer alignment begins with the Ethernet frame as it exists in the embedded CPU's system memory. Ethernet frame data can be fragmented and spread across multiple buffers in memory. Each buffer fragment can begin and end on an arbitrary byte alignment and can be of arbitrary length.
Unaligned data is less than ideal for traditional Ethernet controllers that require the transmit data to be presented to the controller in a 32-bit aligned fashion. Since the data can arrive at the driver in unaligned fragments, the driver must use the CPU to gather the scattered fragments and realign the data before writing it to the Ethernet controller. This process is highly inefficient because the data must be read from the system's memory, realigned, and then written to the Ethernet controller. This is at least a three-step process, compared to reading data from a buffer in the system's memory and writing directly to the Ethernet controller.
One of the most disruptive side effects is incompatibility with direct memory access (DMA) controllers. Traditionally, DMA controllers in embedded CPUs are not capable of performing the realignment of the data, rendering them useless for moving Ethernet data in the system. The responsibility of data movement and realignment then falls to the embedded CPU consuming MIPS that are better leveraged elsewhere in the application.
The ideal Ethernet controller would automatically handle the realignment of data. Data would then be passed in its native alignment, and the Ethernet controller would understand packet data boundaries. The Ethernet controller would then transparently realign the data internally before it was transmitted. This transparent realignment would relieve the embedded CPU of realigning data via buffer copies. The system would now have the option of using a simple DMA controller to move data.
2. Bus Architecture
In order for an embedded CPU to transmit Ethernet packets, the data needs to be written into a buffer within the Ethernet controller. Conversely, for receiving Ethernet packets, the embedded processor must read the incoming data from the Ethernet controller's internal buffers. Data is moved by the embedded CPU using program I/O (PIO) cycles, or DMA cycles.
Regardless of the operation — transmitting or receiving, using PIO or DMA — the data is moving over the embedded CPU's external local bus. Each transaction takes time, and the more time it takes, the more impact it has on the overall system performance. The key issue is that access times to the Ethernet controller be kept low in order to speed up the overall system.
Traditionally, low-performance Ethernet controllers force the use of CPU wait-states during read and write accesses, equating to longer read and write cycles. Adding more wait-states on the local bus means less time for the CPU to perform other tasks and less bandwidth for both internal and external peripherals.
Other mechanisms, which are not as obvious, can result in additional wait-states. For example, many traditional Ethernet controllers require long data and address setup times. This can require the addition of glue logic and latches. In such a system, it may become necessary to reduce the bus cycle times for every device in the system — including memory.
Some embedded CPUs employ other mechanisms to reduce transaction time on the external local bus. One example of such a mechanism is the burst-mode read transaction. This mode of operation is typically associated with DMA controllers and provides for a reduced bus cycle. During a burst transaction, the control signals are asserted, and the address changes for each read operation. The de-Asertion time between read cycles, normally associated with PIO reads, has been eliminated. Traditional Ethernet controllers do not support burst-mode reads.
Most embedded processors natively support an SRAM-type local bus interface. It would make sense then that the optimal Ethernet controller should closely model an SRAM memory interface. The advantages are obvious. Not only would this Ethernet controller provide a glue-less interface to most embedded processors, all of the characteristics already discussed, such as fast overall bus cycle times, a minimal address and data setup time, and support for burst-mode reads would be applicable.
3. flow control
Another way to increase a non-PCI Ethernet controller's performance is to optimize the Ethernet traffic conditions. This can be done through support for Ethernet flow control.
Ethernet flow control allows the receiving Ethernet device to hold-off its partner transmitter and can prevent receive buffer overflow. Receive buffer overflow is a situation that can occur when the embedded CPU cannot keep up with Ethernet data reception due to interrupt latency, or other factors. In the case of overflow, receive data is lost, leading to severe degradation in system performance.
Many Ethernet devices support full-duplex flow control using a "pause control" frame. The pause operation inhibits transmission of data frames for a specified period of time. A pause operation consists of a frame containing the globally assigned multicast address, the PAUSE opcode, and a parameter indicating duration to inhibit data transmissions. Upon receiving a frame with the reserved multicast address and PAUSE opcode, the Ethernet device inhibits data transmissions for the length of time indicated.
In half-duplex mode, backpressure is used for flow control. The Ethernet controller regulates data reception by "jamming" receive data and purposely creating collisions. After sensing the collision, the remote station will back off its transmission.
The ideal Ethernet controller needs to be able to monitor its internal buffer space, and then automatically send a pause frame or jam without processor intervention. Furthermore, the device should be capable of sending a "zero time" pause frame to reinitiate data transmission when space is available in its internal buffers. Automatic flow control, in turn, improves the system's overall performance by reducing the number of processor interrupts and overhead. Properly implemented flow control will also eliminate receive buffer overflows on both ends of the network.
Why Non-PCI Ethernet? Why Now?
Consumer electronics, entertainment A/V, and traditional home network devices (for example, PC's and printers) are all converging to one network, and Ethernet is clearly becoming the network of choice for connectivity in the home. In many cases, the system designer has a limited number of options as to what embedded CPU can be selected for a CE or A/V design. Therefore, the only way to achieve performance levels required, without migrating to higher-cost CPU solutions, is by optimizing the current non-PCI Ethernet controller. By improving the architecture of a non-PCI Ethernet controller, the overall system performance in demanding applications is improved.
About the Authors
Charlie Forni is the director of engineering at SMSC. Charlie can be reached at charles.forni@smsc.com.
Paul Brant is a senior principle systems architect at SMSC. Paul can be reached at paul.brant@smsc.com
Copyright © 2003 CMP Media, LLC | Privacy Statement
E-mail This Article | Printer-Friendly Page |
Related Articles
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)