Multi-port memories evolve to meet SoC demands

Multi-port memories evolve to meet SoC demands
By Bill Beane, Strategic Marketing Manager, IDT Multi-port Products Division, San Jose, Calif., EE Times
April 28, 2003 (4:33 p.m. EST)
URL: http://www.eetimes.com/story/OEG20030428S0091

For designers charged with integrating a wide range of functions into system-on-chip (SoC) solutions, the cost and complexity of incorporating digital logic, processing, memory, and analog functions often prove inhibitive.

One of the methods of controlling these issues is to rely on external memory, simplifying the SoC solution without penalizing performance or cost. As SoC designers review their external memory options, they must grapple with a variety of conflicting demands. On one hand, they need a memory architecture capable of delivering bandwidth well beyond traditional solutions in order to fully exploit the escalating capabilities of today's high-performance processor cores.

On the other hand, designers need a memory solution that offers maximum flexibility to meet the needs of a wide variety of system architectures, including those that incorporate multiple processors. Designers also need a memory architecture that reduce s the complexity of their design and in the process drives down cost and shortens the design cycle.

Finding a single-shared memory solution capable of meeting all these criteria has not been a simple task. One popular option is a traditional muxed SRAM. Built around a standard, off-the-shelf memory manufactured in high volume, a muxed SRAM offers a very attractive cost structure. However, this advantage is often deceptive. Although a standard muxed SRAM costs less than a specialized dual-port memory on a per-bit basis, this advantage can quickly evaporate when one calculates the total cost from a systems perspective.

Design complexity is a key factor driving the calculation of total cost. Any shared memory architecture built around a standard memory device will require additional external registers and buffers to facilitate two devices sharing a common block of RAM. This will require additional design resources and will likely extend the development cycle. Moreover, these additional logic ele ments will occupy more physical space than an integrated dual-port memory and require a more complex layout.

Perhaps more important are the performance implications. With its single port, a muxed SRAM suffers from a severe performance disadvantage relative to a multi-port alternative. Given the inefficiencies associated with switching a single port from one device to another, each device accessing the muxed SRAM will be limited to less than 50 percent of the maximum theoretical bandwidth of the SRAM.

Conversely, a multi-port memory capable of supporting simultaneous access, sometimes across different bus widths and voltages, imposes no delay on either port during a read or write operation. Accordingly, its maximum performance will exceed the traditional muxed SRAM by a factor of at least two.

While cost and performance are obviously critical considerations, over the past few years the rapid evolution of new functions built into dual-port memories has radically altered the equation be tween specialized memory architectures and those built around traditional single-port alternatives. By addressing port-to-port communication and quality of service issues, these capabilities can help SoC designers meet escalating performance requirements while minimizing on-chip arbitration logic.

No arbitration logic

In the asynchronous realm, the latest dual-port memories continue to provide mailbox interrupts and busy logic functionality to address some of these concerns. The busy logic function basically provides a hardware flag whenever both ports of the dual-port are attempting to access the same location at the same time. It allows one of the accesses to proceed and signals the other port the dual-port is "busy." The designer can then take advantage of built-in interrupt functions to stall the access until the operation on the other port is completed. Importantly, this entire function can be performed without the need for any additional arbitration logic.

This functionality would prove highly useful in an SoC that needed to communicate with a discrete DSP. Typically, the SoC passes blocks of data via shared memory to the DSP for manipulation. Once the DSP has finished its calculations, it posts the results back into memory for the processor to use. If the processor attempts to hand off a new block of data to the DSP and a particular memory address is busy, it will receive a busy signal as an output from the dual-port memory. If the processor asserts a write and immediately sees the busy logic pin go low (logic state of 0), it knows the address it is trying to write to is being used by the DSP.

At this point the processor can hold the write signal until the opposite port has moved to another address and the busy signal indicates it is clear (a logic state of 1), or it can simply post the information to another address, which the busy signal indicates is inactive.

Semaphores are another port-to-port coordination function now available on some dual-port memories. The semaphore logic is a set of eight latches, which are independent of the memory array. The SoC designer can use these latches to pass a 'token' from one port to the other to indicate when a shared resource — such as a system component or a predefined block of shared memory — is in use.

Semaphores are particularly useful in any dual-processor SoC design where a high-speed application cannot tolerate a wait state. In this environment, one processor can use a semaphore to maximize performance by reserving a specific token, thereby inhibiting the second processor from accessing a portion of the dual-port memory or any other shared resource. The availability of this functionality within the dual-port memory relieves the SoC designer from creating additional logic to serve this purpose.

For higher bandwidth applications such as switching or routing, some new synchronous dual-port memories now support speeds up to 200 MHz and bandwidth as high as 14 Gbps. Th ese memories are running so fast they cannot allow a busy arbitration circuit to run a cross-chip comparison and block an access without jeopardizing the integrity of the data. The latest generation of synchronous dual-port memories adds collision detection functionality to address this limitation.

Rather than impose a hardware decision on which access is allowed to occur, collision detection provides a non-intrusive monitoring capability. The memory device defines a collision as an overlap in access between the two ports, which could result in the reading or writing of incorrect data to a specific address. This function is intelligent enough to recognize that if both ports are reading, no data has been corrupted or lost and therefore no collision flag is output. If one port is writing and one is reading, it considers the write valid and since the reading port might capture data in a state of transition, outputs a collision flag on the reading port. If both ports are writing and there is a risk that the data stored in memory from either port will not be valid, it outputs a flag for both ports.

Collision detection plays a crucial role in ensuring quality of service by highlighting potential problems. Designers can use this information to modify operation of the system depending upon the application. For example, a low rate of collision in a voice application that does not impact the quality of the signal may not warrant action. Alternately, the same level of collision in a mission-critical SAN device may require the designer to ensure each port is operating in different segments of the memory array.

In the coming years, multi-port memory designers will continue to drive down power consumption by reducing core voltage to 1.8 volt and below. At the same time, the migration to newer semiconductor process technologies with smaller internal geometries will allow architects to aggressively increase multi-port memory densities to support rising data traffic rates and escalating buffering requir ements. The most significant advances in performance will likely come from the continual development and refinement of tailored functionalities implemented in on-chip logic. By reducing the complexity of the SoC designer's task, these new capabilities will play a key role in meeting performance requirements and facilitating the transfer of information.