|
|||||
Network DRAMs Shine in Datapath Designs Network DRAMs Shine in Datapath Designs Choosing the right memory device can be a daunting task when building today's networking architectures. Engineers must choose between ternary content-addressable memories (TCAMs), multiple DRAM flavors, quad-data-rate (QDR) SRAMs, SDRAMs, and more. And, the task doesn't get any easier going forward. With network speeds moving into the 10-Gbit/s range and beyond, choosing the right member architecture will be even more challenging going forward. Traditionally, designers have relied on TCAM/SRAM combos to meet the storage needs of networking architectures. While this combo still thrives today, many have questioned whether TCAM/SRAM combos can meet the performance demands of 10-Gbit/s networking boxes and beyond. Seeing a potential opportunity, DRAM vendors have attacked the problem and created two flavors of devices: one being the Reduced-Latency DRAM (RLDRAM) and the other called the Network DRAM (also known as Fast-Cycle RAMs). While both archit ectures have their merits, this article will provide performance data that shows why the Network DRAM thrives in next-generation networking designs. Some Basics Building on the capabilities deliv ered in the first-generation (discussed above), a second generation of network DRAM devices have been developed that deliver 288-Mbit performance, ECC-friendly widths of x9, x18, x36, on-die termination, a 20-ns tRC, and a simpler unidirectional pair of data strobes (DS, QS) for ease of high speed design. The second generation also supports a 576-Mbit device that delivers 8 banks, a target row cycle time (tRC) of 18 ns, and a better than 800-Mbit/s data rate. The Network DRAM is similar to a DDR or DDR2 part in that it has a 4-bit pre-fetch architecture. To achieve the greatly reduced latency over a standard DDR part, the Network DRAM has shorter bit lines (resulting in fewer column addresses) and a faster sense amplifier compared to the standard DDR or DDR2 part. Table 1 gives us a look at how the Network DRAM compares to the standard DRAMs that designers are familiar with. Performance Chara cteristics
ASIC, ASSP, microprocessor, and network processor designers making memory choices for devices being designed today should consider that by late 2004 or early 2005, the time silicon in design today will come to market, DDR2 should be on the verge of mainstre am usage in PCs. As most people familiar with the DRAM industry know, a particular DRAM type achieves its lowest cost-per-bit when it becomes mainstream in the PC industry. To that end, we will take a short look at the performance characteristics of three DRAM types expected to be common then: DDR2, Network DRAM II, RLDRAM II. When a designer is considering a choice among these parts it is useful to remember that any ECC width requirements will necessitate an additional DDR2 part to fill out the 72-/144-/288-bit bus, while the Network DRAM II and RLDRAM II support those widths by design. An additional point is that with a BL=4 part, any bus that is 18 bits wide will automatically support ECC, since each minimum transfer is 72 bits. As another data point, many network systems currently in design are planned to operate at 250 MHz. The timing chart, shown in Figure 1, gives a comparison of how the Network DRAM II and the RLDRAM II perform in that situation.
As Figure 1 points out, at 250 MHz frequency, the network DRAM has a higher effective bandwidth than the RLDRAM II. Note: information on the RLDRAM II is derived from Micron's 288M datasheet dated 5/3/2003. Table 3 shows a bandwidth comparison under some worst case scenarios for DDR2, Network DRAM II, and RLDRAM II. As designers can see, in the worst case situation of repeated reads and writes to the same bank, Network DRAM has a slightly higher performance than the other parts in a couple of situations. And in every case, the network DRAM is 30 to 100 percent higher performance than DDR2. With this information, we can begin to evaluate the performance of Network DRAM in various applications.
Switch and Router Implemen tation At 500-Mbit/s data rate and a 20 ns tRC, the bit striped bandwidth of a Network DRAM is 444 Mbit/s per pin. To write the packet with ECC, a memory device must write 360 bits in 32 ns, equating to a bandwidth of 11.25 Gbit/s. This is more than met by a 36-bit bus and the lowest-speed Network DRAM II. Standard DDR and DDR2 memory types can also meet this bandwidth requirement if the memory bus width is increased, but the memory controller design can be simplified if the tRC is shorter than the minimum packet rate. Figure 2 indicates what DRAM bandwidth, frequency, and bus widths are needed for various line speeds. This graph takes an idealized bandwidth and then increases the required bandwidth by a factor of 4 to account for write-to-read turn around and other latency hits. If the worst-case scenario is considered (a read to the same bank is going on when the packet is being written) the 200 Mbit/s per pin bandwidth can achieve this with a 144-bit bus.
Designers must remember that the bus bandwidth in case shown in Figure 2 is 100 Mbit/s of read and 100 Mbit/s of write. Thus, an 11.25-Gbit/s read is more than met by the network DRAM's 144-bit bus, which is a fairly common memory bus width in switches and routers. IPv6 and Network DRAM For IPv6 over Ethernet, the minimum packet size is 64 bytes. For IPv6 on POS the minimum packet size is 60 bytes (the 40 bytes of IPv4 plus the 20 additional bytes of header). With this in mind, let's perform the same sort of minimum packet size at line speed analysis for the IPv6 that has historically been done for IPv4 memory performance analysis. A burst of 64 byte packets over OC-192c would come over the line every 51.2 ns. Using a 144-bit bus, a burst of 4 write, the entire 64-byte packet plus ECC can be written into the packet buffer in 20 ns at even the slowest network DRAM II frequency. Or with a narrower 72-bit bus, two burst of four writes can be done even to the same bank in 40 ns, which is well under the 51.2-ns target. Within the 51.2 ns per smallest IPv6 packet window, two independent Network DRAM acc esses can occur, this would allow much more processing of both header and payload in an IPv6-based switch or router. For a POS IPv6 packet of 60 bytes, the critical time is 48 ns. With even the smallest IPv6 packet, two memory accesses can occur within the critical packet time. Here is an area where the low latency of the Network DRAM allows for performance a step beyond anything achievable with even a very wide bus and standard DDR or DDR2 DRAMs. Dealing with Multi-Threading and Multi-Cores A common rule of thumb for cache size considerations is that as one goes from Level 1 to Level 2 to Level 3, each cache should be at least 8x larger than the level below. For a process or with 2 to 4 Mbytes of L2 cache, that means an L3 should be in the range of 16 to 32 Mbytes. That size of cache is very expensive if built in SRAM, but more reasonably priced if an appropriate DRAM solution is available. The servers that these newer processors are designed for are also power sensitive. At the same frequencies, SRAMs are quite power-hungry compared to DRAMs. Drams offer an attractive solution in this case. Due to its low latency, the network dram ii is the most compelling dram solution for L3 caches in these multi-threaded, multi-core processors. To show how Network DRAM II devices thrive in multi-threading and multi-core implementations, two interesting metrics can be evaluated: power-per-Mbyte-per-Mbit/s-per-pin, and power-per-Mbyte-per-latency. The network DRAM has a power/Mbyte/Mbit/s of (2.5V*180mA)/36Mbyte/800 Mbit/s = 0.015. IN comparison, a high-speed cache-type DDR SRAM delivers a power/Mbyte/Mbit/s pf (2.5V*850mA)/2.25MByte/800Mbit/s = 1.18. Thus, the Network DRAM II device is two orders of magnitude better than a DDR SRAM in this metric. When looking at the second spec, the Network DRAM delivers a power/Mbyte/latency equal to (2.5V*180mA)/36MB/20ns = 0.625. In the case of SRAMs, this spec is computed as (2.5V*750mA)/2.25MB/4ns = 208, again showing a significant improvement for the Network DRAM device. The Network DRAM still only creates a compelling advantage in very large servers that feature clock speeds on the order of 2GHz. In these designs, off-chip latency is roughly 40 cycles for Network DRAM II (0.5-ns cycle time and 20-ns read latency yields). This is much better than the 80 cycles that designers would get with a DDR2 DRAM as an L3 based on its roughly 40-ns first access. The penetration of Network DRAM into this area will of course depend ultimately upon the success of the multi-threaded, multi-core processors. A few large challenges loom for architects of processors. First they must prove in silicon that the concept of multi-threads and multi- cores actually achieve improved results over heavily pipelined single-threaded processors,. Processor vendors must also show that these architectures can meet the cost targets of the price-sensitive communication sector. Once these challenges get sorted out, designers can begin tapping Network DRAMs for other multi-thread/multi-core processor architectures. Working with Offload Engines The memory requirements for TOEs are not any different from the switch and router requirements outlined above. Therefore, as shown above, Network DRAM I and II devices will thrive in these applications. Wrap Up Since the Network DRAM is conceived of as a "seed product" in the development of low-latency DRAM, manufacturers will continue to enhance the product. Some targets include a latency of less than 10 ns and the adoption of a multi-data clock such as QDR/ODR for higher bandwidth. Reference About the Author |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |