|
|||||||||
Making SPI-4.2 Implementations More Efficient: Part 2
Making SPI-4.2 Implementations More Efficient: Part 2 The SPI-4.2 interface has quickly achieved the industry-wide recognition and is highly accepted as standard high-speed interface in the networking chip space. However, creating an efficient SPI-4.2 interface provides many challenges to a system design, such as buffer overflow and underflow, to a system design. The solutions to address above mentioned concerns are conflicting and the users need make the right decision to effectively use SPI-4.2. In this two-part series, we'll look at the steps designers need to take in order to efficient develop a SPI-4.2 interface. In Part 1 of this series, we provided an overview of the SPI-4.2 interface spec and then looked at the data transfer mechanism, latency, and potential buffer issues. Now, in Part 2, we'll look at issues needed to improve bandwidth utilization on a SPI-4.2 bus. We'll also look at techniques for effectively scheduling training on a SPI-4.2 link. Bandwidth Utilization However, as we make effort to utilize the SPI-4.2 bandwidth, it is also important to ensure that the bandwidth of the SPI-4.2 interface is shared as desired across the ports in order to achieve the goals of the system. Below, we'll analyze the above mentioned components and examine impact on achieving performance on a SPI-4.2 interface. The importance of best possible configuration and use of SPI-4.2 interface in a system, to achieve high bandwidth utilization cannot be overseen. Choosing Maxburst Values Choosing smaller bursts values avoids hogging of interface by high bandwidth ports for long times. By choosing small bursts, the data fetch latency (accessing the data from the source) quickly affects the achievable bandwidth. The delays encountered in fetching small chunks of data from the source will be reflected as inter burst gap on the SPI-4.2. Choosing small burst values also implies that arbitration will happen too often. While an arbitration scheme is typically implementation specific, a fair arbitration scheme will require evaluating the activity status of all the ports. The process of arbitration may take up to few clock cycles for large number of ports. The delays introduced by arbitration will be reflected as inter burst gap and thus affect the achievable bandwidth. To minimize the affect of data fetch latency and arbitration latency, a typical SPI-4.2 implementation can schedule data for few (typically two to four) bursts in its internal data buffer to absorb these latencies and hence reduce the inter burst gap. The large data buffers in the design increases the data scheduling latency. A port indicating starving or hungry after a long time may win in any fair arbitration scheme immediately but the actual data for that port will be transferred after completely transferring the current data scheduled. The situation is further aggravated if the port does not win arbitration within some time. This aggravated situation may be caused by large maxburst values. The increase in size of buffer improves bandwidth utilization but adversely affects the FIFO depth requirements; the deeper buffers increase the low watermark requirement to avoid underflow. The deeper buffers directly increase to the data scheduling latency. Ideally, arbitration should use the latest available status and data availability information. Here a transmitter arbitrates early in order to keep its pipeline full, responsiveness to port'' status changes and data availability is reduced. Suppose, a port changes from a hungry status to a starving status. In this case, an arbiter should account for this change and reflect this change on the SPI-4.2 burst transfer for that port. This can be done by scheduling a longer burst of data for this port or may even decide to prioritize this port, being starving, over others depending on the implementation. The port can't respond to this change very quickly, if it has arbitrated earlier in time and has a pipeline full of data scheduled, to send on the SPI-4.2 channel. Calendar Sequences and Maxburst Value The length of the calendar sequence is CAL_LEN*CAL_M + 2. A typical LVTTL status operates at quarter the data clock rate, thus, the number of data clocks elapsed in one complete status sequence is (CAL_LEN*CAL_M+2)*4. The amount data transferred over the data channel in one complete status sequence for LVTTL case is (CAL_LEN*CAL_M +2)*4*2*2 bytes = (CAL_LEN*CAL_M+2)*16 bytes, since 2 bytes are transferred every clock and data channel is clocked at both the edges. The maxburst values should be high enough to grant accumulated credit of (CAL_LEN*CAL_M + 2)*16 bytes to all currently active ports most of the time. The LVDS status operates at the data clock rate. Hence the grated accumulated credit number is (CAL_LEN*CAL_M + 2)*4 bytes. If the average data rate of each port is matched then maxburst values can be configured to relatively small numbers, allowing all ports to utilize the bandwidth evenly. For N port design, the minimum suitable maxburst value is given by the equation: LVTTL: Minimum maxburst = (CAL_LEN*CAL_M + 2)/ N (number of ports) If the ports have widely different data rates then the maxburst value must be configured to a higher number such that the least number of active ports at any given time consumes all the bandwidth. To be pessimistic, let's assume that least number of active ports can be one. The maximum suitable maxburst value is given by equation: LVTTL: maxburst = (CAL_LEN*CAL_M + 2). This is the maximum suitable value for maxburst since larger value(s) does not allow for additional data traffic. Choosing the larger maxburst values causes arbitration to happen in frequently and ports with low data rate may starve for data longer than acceptable limits. The low data rate ports may not be able to transfer data even when the port is starving or hungry. The active ports with long data bursts might occupy the data channel for longer periods. This causes the arbitration scheme to appear highly unfair for low bandwidth ports. Widely different data rates across the ports can be achieved by ingeniously initializing the calendar sequence. A high bandwidth port can occupy many slots in the status sequence and low bandwidth ports will occupy only few (1 or 2) slots. As mentioned earlier, that shorter burst lengths can quickly affect the achievable SPI-4.2 bandwidth. So, a reasonable burst size per port must be selected and calendar sequence should be programmed to reflect the port's data rate. The reasonable burst size should grant enough credits to the corresponding port to remain active on SPI-4.2, if it has data to transfer, until its next slot in the status sequence. Status Path (LVDS or LVTTL) The minimum maxburst value for LVDS status path is 25% compared to the LVTTL path to get full utilization of SPI-4.2 link. The LVDS status path provides faster updates and requires smaller minimum maxburst value to the status ports, thus, reduces the buffering requirements. In general, the tendency is to choose LVDS status path for more than 16 ports. Making the LVDS status choice for more than 16 ports may not be the best solution, since STATUS_PATH_LATENCY for LVDS is slightly high due to additional functionality of training sequences and alignment. This increase in STATUS_PATH_LATENCY tends to increase the buffering requirements. The additional functionalities (alignment and training) of the LVDS status path leads to larger number of gates in the LVDS status path design. The LVDS status path typically will use a serializer/deserializer (serdes), which adds to the cost of the system. Choosing CAL_M, CAL_LEN and CAL[i] CAL_LEN = LCM (N, P)*Q. The calendar entries are programmed in a fashion to satisfy the required bandwidth for the each port. The port with higher bandwidth occurs more frequently in the calendar sequence then the port with smaller bandwidth requirement. Making consecutive entries of the same port in the calendar sequence should be avoided; the port with repeated occurrence (the higher bandwidth ports) in the calendar sequence should be distributed evenly through out the calendar sequence. Scattering the repeated entries throughout the calendar sequence provides regular credit updates to the port. The CAL_M feature is used to repeat the calendar sequence a number of times before inserting the DIP-2 and framing pattern. This reduces the overhead of parity and framing on the status path sequence; the framing and DIP2 overhead becomes seemingly visible and quickly affects the performance for smaller CAL_LEN*CAL_M number. In a typical LVTTL system, where the DIP-2 errors are not expected so often, CAL_M is usually programmed to a high value. A judicious choice is to be made for LVDS system, where a balanced choice between the error response capabilities of system (Status path signaling errors detected at DIP2 entry) verses framing and DIP2 overhead on status path is to be made. Scheduling Training The number of repetition of the training sequences is decided by the synchronizing capability of the designs. Some designs and physical layers (PHYs) require more than one training pattern to synchronize. Users should use the regular training feature of the specifications to schedule training at regular intervals to avoid SPI-4.2 loss of synchronization. Recovering from loss of synchronization condition could possibly take more cycles than sending regular training patterns. Regular scheduling of training sequences helps in avoiding loss of synchronization. Thus, scheduling regular training effectively increases the SPI-4.2 link utilization. Editor's Note: To view Part 1 of this article, click here. About the Authors Prakash Bare is the vice president of engineering in GDA Technologies' IP division. He can be reached at prakash@gdatech.com.
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |