|
|||||||||||||||
DPCI: An Efficient Scalable System-on-chip Communication Architecture
By Nan Wang and Magdy A. Bayoumi, University of Louisiana at Lafayette
Lafayette USA Abstract: Modern system-on-chip (SOC) designs consist of numerous heterogeneous components integrated onto a single chip (embedded CPUs, dedicated hardware, FPGAs, embedded memories, etc). The on-chip communication is becoming the bottleneck for these SOC designs most of which employ shared-bus based communication architecture. This paper presents an efficient scalable communication architecture, Data Pre-fetch Core Interface (DPCI), for shared-bus based SOC systems to support scalable and pipelining communication between those IP blocks, the shared memory and the bus so as to improve the system performance and increase the system bandwidth and flexibility. The proposed architecture exhibits both hardware simplicity and system performance improvement. Through experimentation it has shown that the proposed architecture not only reduces the bus idle time and the communication overhead, but also improves the system performance significantly. 1. INTRODUCTION As technology scales toward deeper submicron, the role played by the on-chip communication architecture is becoming a critical determinant of system-level metrics, such as system performance, and power consumption which depends more on the efficient communication among master cores and on the balanced distribution of the computation among them, rather than on pure CPU speed [2]. The most widely adopted interconnected architecture for the SOC IP blocks is still bus-based. Some semiconductor vendors have developed several on-chip bus architectures [3-5] for embedded system designs, which employ numerous communication architectures [6-12]. However, such approach has several shortcomings which will limit its use in future SOCs, such as non-scalability, non-predictable wire delay and large power consumption. In this paper, a scalable communication architecture for shared-bus based SOC system, Data Pre-fetch Core Interface (DPCI) is presented. The architecture of the proposed design is shown in Fig.1.
A dedicated DPCI inserted between each master core and the shared bus not only supports regular scalable communication between masters and the shared resources, but also serves as an Open Core Protocol (OCP) to allow third party IP cores to be plug-and-played in the system; it also supports data pre-fetch operations for the master cores. The bus utilization and system performance has been increased significantly by employing the DPCI architectures. This paper is organized as follows: Section 2 introduces shared-bus based SOC communication architecture. Section 3 details the new architecture design. The test results are presented in Section 4 and section 5 concludes the paper. To address the problem and offer an efficient solution, we now present our proposed DPCI architecture. A. Data Pre-fetch Core Interface Generally, master cores and the shared bus are operating at different speeds. To avoid system meta-stability, timing and data corruption problems, we insert the DPCI architectures between the masters and the shared bus as shown in Fig.2. First, they serve as a traditional buffer which takes care of the problem of crossing of the clock domains and alleviates the communication contentions between the master cores and the shared bus. Secondly, the DPCI architectures support data pre-fetching operation for the masters so as to further increase the system efficiency. Finally, by configuring the configuration unit of the DPCI architecture, IP cores can be plug-and-played in our system. The architecture and signal integrities of DPCI are shown in Fig.3.
The functions of the DPCIs are described as following:
Figure 4. Pipelining communication
do { if (write request is granted) then }if (write buffer is empty) then write data from master to write buffer only; else if (write buffer is full) thenwrite data from write buffer to memory through bus only; else write data from buffer to memory through bus and;write data from master to write buffer; else if (write request is rejected) thenif (write buffer is not full) then write data from master to write buffer; else wait for the bus idle time to write the data from write bufferto memory before master issues another request; while (task is not finished)
do { if ( read request is granted) then if (read buffer has been updated by correct data) then read data from read buffer to master; pre-fetch next set of data from memory to read buffer; else read data from memory to read buffer ; else if (read request is rejected) then if (read buffer has been updated by correct data) then read data from read buffer to master only; else wait for the bus idle time to update the buffer with pre-fetch data before master issue another read request; }while (task is not finished) In summary, the proposed DPCI architecture is capable of overlapping the communication time with other effective operation time of the master cores so as to reduce the communication overhead and improve the overall system performance. Moreover, the system performance has been increased further by employing the data pre-fetch scheme. 4. TEST SYSTEM AND RESULTS A. Design Complexity and Speed We map the DPCI architecture onto Xilinx Vertex2Pro FPGA. The targeting device is xc2vp2, package fg456. The estimation gate count of the communication architecture is 2156 and the maximum delay of the architecture is 3.3ns, so that it can work in the system with maximum speed 300MHz. B. Efficiency of the DPCI Architecture The Increment Priority Bus (IPB), Weighted Round Robin (WRR), TDMA bus, LotteryBus, SFCB and DFCB were implemented on the generalized shared-bus architecture as shown in Fig. 2 to perform 8 * 8 matrix multiplications. Every master core calculates two rows in the result matrix which includes 16 * 8 reads and 8 writes, 16 * 8 word multiplications and 16 * 7 double word additions. The master cores were kept busy computing and communicating with the data memory through the shared bus until the job was completed. The assignments were: (1) IPB (Increments): 1, 2, 3 and 4 (2) WRR (weights): 1, 2, 3 and 4 (3) TDMA (slots): 1: 2: 3: 4 (4) LotteryBus (tickets): 1: 1: 4: 6 (5)(6)SFCB and DFCB: 8%: 8%: 32%: 52% for master cores number 1 to 4 respectively. Read/write ratio and burst size were set to 8:1 and 8 words respectively. The matrix multiplication was carried out twice on the shared bus architecture with and without embedding DFCIs to test the efficiency of our proposed DFCI architecture. The test results of average execution speed (demanded execution time), bus utilization (bus busy time/total execution time) and throughout (number of processed matrixes/per second) from two executions are shown in Table 1.
Our proposed DFCI architecture serves as a core interface to allow the IP cores to be easily plug-and-played into the system and is capable of overlapping the communication time with other effective operation time of the master cores so as to reduce the communication overhead and improve the overall system performance. Moreover, the system performance has been increased further by employing the data pre-fetch scheme. A scalable communication architecture for shared-bus based SOC system is presented in this paper. The test results demonstrate that the proposed communication architecture helps to reduce bus idle time and communication overhead and improve the system performance by providing the scalable and pipelining communication ability, while only bring a reasonable extra cost to the system. REFERENCES [1] Kanishka Lahiri, Sujit Dey and Raghunathan, “On-chip Communication: System-level Architectures and Design Methodologies”, http://esdat.ucs.edu/projects/codesign/right-frame.html, 2001. [2] Francesco Poletti, Davide Bertozzi, Luca Benini, and Alessandro Bogliolo, “Performance Analysis of Arbitration Policies for SOC Communication Architecture”, in Proc. Design Automation for Embedded System, 2003, pp. 189-210. [3] “IBM On-chip CoreConnect Bus Architecture” [4] D.Flynn, “AMBA: Enabling Reusable On-chip Designs”, IEEE Micro, vol.17, no.4, 1997, pp 20-27. [6] “Peripheral Interconnect Bus Architecture” [7] “Sonics Integration Architecture, Sonics INC.” [8] OMI 324 PI Bus, Rev 0.3d. OMJ Standards Drafts, 1994. [9] “Round Robins and challenges” [10] K. Lahiri, A.Raghunathan, G. Lakshminarayana, “LotteryBus: A New High-Performance Communication Architecture for System-on-Chip Designs”, 38th Conference on Design Automation (DAC'01), 2001, pp.15-20. [11] S. Lee, C. Lee and H-Jae Lee, “A New Multi-channel On-chip-bus Architecture for System-on-chips”, IEEE International SOC Conference 2004, pp 305-308, Sep 2004. [12] Nan Wang and Magdy A. Bayoumi, “Dynamic Fraction Control Bus: New SOC On-chip Communication Architecture Design”, in Proc. IEEE Intl. SOCC Conf., Sep 2005, pp 199-202. |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |