Optimized Memory Accessing Through Coupling of Byte Enable Signals
Rohit Goyal, Amit Garg (Freescale Semiconductor)
Conventional Byte Enabled Memories/Register Space are accessed for read/write according to the state of read/write signal which might be hwrite signal for AHB Interface, ips_rwb signal for IPS Interface and a R/W signal for Memories, and byte enable signals are used to decide which of the bytes to be read/written. Hence there is a need to route read/write signal to each and every register/memory block which may cause routing congestion which in turn leads to constraints in terms of timing, area and power. The Optimized Memory Access Algorithm removes the use of read/write signal and utilizes the state of byte enable signals itself to determine whether it’s a read or write operation and thus reduces the routing congestion due to fewer routing paths required between memory and other circuit on the device and also simplifies the read/write logic path, which helps in relaxing constraints in terms of timing, area and power.
Since all modern day SOCs are timing, area, and power critical. So, present techniques may fail due to insufficient routing bandwidth to accommodate necessary additional signals. In the later section we will discuss an optimized algorithm for accessing byte enabled memories/register space which helps in relaxing constraints in terms of timing, area and power by eliminating one of the top level interface signals.
For accessing Byte Enabled Memories/Register Space, the requirement of read/write signal can be eliminated by use of optimized memory access algorithm as this algorithm uses byte enable signals itself to determine whether the operation is a read operation or a write one. In case of write operation, at least one of the byte enables will be set high to indicate a write operation on the corresponding byte according to the address specified and if byte enables signals are all 0s, then it will be a read operation. So, this algorithm for memory access allows all 8, 16 and 32-bit write access, however read access will be of 32-bit as there is no additional advantage of byte wise access in case of read operation, because it will be anyways engaging the complete read bus. As, the read/write operation implemented in algorithm is independent of the read/write signal, so this algorithm not only simplifies the read/write path logic but also removes the overhead, both in terms of routing of read/write signal on SOC and complexity of top level Memory Interface, and thus achieves relaxation in constraints for timing, area and power.
A comparison between the conventional memory access algorithm and the optimized algorithm for three different interfaces (Memory/ AHB/ IPS) is depicted below.
1. Memory Interface
Conventional Algorithm for 32-bit Byte Enabled Memory :
A Memory Controller implemented using conventional algorithm is shown below in Fig.1. It samples the csb (chip select) signal at first to check if the memory block is selected or not. Then it checks for the state of the rwb (read/write) signal to detect whether it’s a read operation or a write operation. After concluding whether it’s a read or a write operation, byte enables signals are observed to detect on which of the bytes read/write operation needs to be performed and address to be accessed is given by Addr (Address) signal.
Optimized Algorithm For 32-Bit Byte Enabled Memory :
A Memory Controller implemented using optimized algorithm is shown in Fig.2. It also samples the csb (chip select) signal at first to check if the memory block is selected or not as the case with the conventional one. Since in optimized implementation, byte enables signals are used to detect whether the operation is a read operation or a write operation, so rwb (read/write) signal is no longer required. If any of the byte enable signals are high, then there will be a write operation on the corresponding byte and if byte enable signals are all 0s then the operation will be a read operation and address to be accessed is given by Addr signal.
Fig.1. Conventional Memory Controller
Fig.2. Optimized Memory Controller
So, this optimized algorithm simplifies the memory access logic by removing a stage of sampling the state of read/write signal and thus not only provides gain in terms of timing, area and power but also relax the routing congestion on the SOC as there is no need to route rwb signal all the way from core to the Memory Block.
2. AHB Interface
Conventional Interface with HWRITE signal along with HBSTRB signal :
Conventional AHB interface is shown below in Fig.3, having a Core (AHB Master) connected on one of the master ports and a Memory Block/Register Space connected on one of the slave ports of the AHB Bus. Now, slave memory/register space will be accessed by core, by selecting the slave first through HSELx (module select) signal. Then HWRITE signal is used to indicate whether it’s a read operation or a write operation and HBSTRB (byte strobe) signal decides which of the bytes to be accessed/modified and the address of the memory/register to be accessed is provided through HADDR signal.
Optimized AHB Interface without HWRITE signal with just HBSTRB signal :
Optimized AHB interface is shown below in Fig.4, having a Core (AHB Master) connected on one of the master ports and a Memory Block/Register Space connected on one of the slave ports of the AHB Bus. Now, slave memory/register space will be accessed by core, by selecting the slave first through HSELx (module select) signal just the same way as the conventional one. Since in this proposed implementation, HBSTRB signal is self sufficient to detect whether it’s a read operation or a write operation, so HWRITE (read/write) signal is no longer required. If a non-zero value is driven on the HBSTRB signal, then there will be a write operation and if HBSTRB signal is 0 then the operation will be a read operation and address to be accessed is given by HADDR signal.
Fig.3. Conventional AHB Interface
Fig.4. Optimized AHB Interface
So, this optimized algorithm simplifies the memory/register access logic by removing a stage of sampling the state of read/write signal and thus not only provides gain in terms of timing, area and power but also relax the routing congestion on the SOC as there is no need to route HWRITE signal all the way from core to the Memory Block/Register Space.
3. IPS InterfaceConventional register access algorithm with ips_rwb signal :
Conventional IPS mapped register access algorithm is shown below in Fig.5, having a Memory Block/Register Space connected on one of the ports of the AIPS decoder. Now, IPS mapped memory/register space will be accessed by core, by selecting the module first through ips_module_en (module select) signal. Then ips_rwb signal is used to indicate whether it’s a read operation or a write operation and ips_byte_enablex signals decide which of the bytes needs to be read/written and the address of the memory/register to be accessed is specified through ips_addr signal.
Optimized register access algorithm without ips_rwb signal :
Optimized IPS mapped register access is shown below in Fig.6, having a Memory Block/Register Space connected on one of the ports of the AIPS decoder. Now, IPS mapped memory/register space will be accessed by core, by selecting the module first through ips_module_en (module select) signal just the same way as the conventional one. Since in this proposed implementation, ips_byte_enablex signals are self sufficient to detect whether it’s a read operation or a write operation, so ips_rwb (read/write) signal is no longer required. If any of the ips_byte_enablex signals are high, then there will be a write operation on the corresponding byte and if ips_byte_enablex signals are all 0s then the operation will be a read operation and address to be accessed is given by ips_addr signal.
Fig.5. Conventional AHB Interface
Fig.6. Optimized AHB Interface
So, the optimized algorithm simplifies the memory/register access logic by removing a stage of sampling the state of read/write signal and thus not only provides gain in terms of timing, area and power but also relax the routing congestion on the SOC as there is no need to route ips_rwb signal all the way from core to the Memory Block/Register Space.
Comparison and Results :
IPS Interface with optimized algorithm is compared against the conventional IPS Interface in Table.1 below:
Table 1
*(All these results are for a single General Purpose Register (GPR) only. So the gain will be even more significant for SOC, as SOC have large number of GPRs available.)
As shown above in Table.1, Optimized implementation provides a significant gain of 3.71% in terms of Area. In terms of Timing, it provides a marginal improvement of 11ps on read path but a significant gain of 141ps i.e. 4.26% on write path. In terms of Power, it provides a significant 5.76% gain in leakage power and 7.74% gain in dynamic power.
As already mentioned the gain observed in terms of timing/area and power by using optimized implementation gets even more significant when analyzed for an SOC, since a SOC have large number of GPRs available. The projected gain for a SOC having 1500 GPRs is shown below in Table.2
Table 2
As shown in Table.2 above optimized memory accessing approach not only relax the routing congestion on the SOCs as by eliminating the requirement to route read/write signal all the way from core to the Memory Block or Register Space but provides significant gain in terms of Timing (~3,168ps), Area (29,640 um sq.) and Power (~5.6 mW).
Conclusion:
The Optimized Memory Accessing algorithm
- Provides significant relaxation in the routing congestion on the SOC as there is no longer need to route read/write signal all the way from core to the Memory Block.
- Is beneficial for area reduction of the die as it provides significant improvement in area requirements.
- Provides significant gain both in leakage and dynamic power.
- Provides relaxation in timing closure constraint of the worst path (write path).
- Can be used in every SOC, as this is backward compatible and allows byte-wise write access to Memory/Register Space.
- Provides even larger gain for bigger SOCs having large number of GPRs.
|
Related Articles
- Achieving High Performance Non-Volatile Memory Access Through "Execute-In-Place" Feature
- LPDDR flash: A memory optimized for automotive systems
- An 800 Mpixels/s, ~260 LUTs Implementation of the QOI Lossless Image Compression Algorithm and its Improvement through Hilbert Scanning
- Delivering High Quality Analog Video Signals With Optimized Video DACs
- Dealing with memory access ordering in complex embedded designs
New Articles
Most Popular
E-mail This Article | Printer-Friendly Page |