|
|||
A NoC-based Communication Framework For Seamless IP Integration in Complex Systems
Fabien Clermidy, Didier Varreau, Didier Lattard, CEA/LETI
Grenoble, France Abstract:
In this paper, we present a NoC-based communication framework that is used to develop complex chips including a large number of heterogeneous IPs. Synchronization and a reconfiguration schemes are proposed to handle the complexity and efficiently decouple SoC communications from computation. Finally, we describe an 8Mgates NoC-based chip dedicated to telecommunication applications that has been developed to prove the concept. IP integration in this kind of System on Chip (SoC) leads to some classical issues, as functional and performance validations, test and debug facilities, place and route, but also new problems, such as multi-application programming and platform reconfiguration. This framework has been used for the development of a multi-applications telecommunication platform named FAUST (Flexible Architecture of Unified System for Telecom), aimed at handling multiple OFDM-based systems. Currently, the NI is able to change protocols between the IP and the NoC protocol [3]. More smartness is added in [4] with a QoS mechanism, that splits the available bandwidth between the different IP, in a configurable way. Nevertheless, the communication flow itself is managed inside the different IP with a higher level protocol. Also, once the IP integrated in the system, there is no guarantee of communication matching with other IP.
The architecture described in this paper proposes a high level network interface that combines QoS policy, a synchronization scheme that both allows a secured integration of the IP and eases the application programming of the final SoC, and a general dynamic reconfiguration scheme of the IP. B. Overview of the FAUST NoC The NoC transfers packets in a wormhole flow, from an emitter to a receiver. Packets are composed of flits. The first flit, named the header, contains the routing path (coded in the emitter IP). The node decodes this path to know the new direction (i.e. the output of the node) to follow. Then, it arbitrates between the different requests to this direction using a first-come first-serve algorithm. Two virtual channels are available, the first one is for best-effort traffic, and the second one is dedicated to real-time traffic (guaranteed latency packets). C. Synchronization scheme The data flow control and the synchronization are done thanks to a ¡§pull data¡¨ scheme. Two NI blocks, the Input Communication Controller (ICC) and the Output Communication Controller (OCC) are used for this operation. The ICC is a programmable credit generator. It is associated to a FIFO and distributes the available places of the FIFO to other IP (the data producers) in a sequential and programmable manner. For simultaneous input flows management, 2 or more FIFO could be used. At the output side, and depending on the configuration selected by the input data flow, the OCC sends the data to the corresponding consumer(s) according to available credits. The OCC then forms a packet to the consumer(s). For example, figure 3 shows an OFDM input flow control that appears in a classical transmission process (framing operation). In a first phase (Fig. 3a), ¡§full pilots¡¨ have to be transmitted from a RAM unit to the OFDM modulator:
In a second phase (Fig. 3b), data have to be transmitted from the mapping block and, simultaneously, continuous pilots are coming from a RAM unit:
Fig. 3: Example of framing flows in OFDM To conclude, this technique requires a small amount of hardware overhead to perform the merge of two or more sequential or simultaneous flows and/or the splitting of one flow in two or more sequential or simultaneous flows. D. Reconfiguration scheme The Read Write Decoder (RWD) and the Configuration Manager (CFM) are the two blocks dedicated to the IP configurations management. The RWD decodes the configurations needed for both the NI itself and the IP core. As only the NI part is generic, a tool allows designers to add the IP specific address mapping for the decoding of the configuration. According to the applications requirements, the RWD is able to decode and store several configurations. For a given IP, the whole data flow is split into data blocks, i.e. a set of data using the same computation configuration. The first network packet of a data block contains a specific command (INIT_WRITE) and the configuration identifier to be used for the computation of these data. The following network packets contain classical WRITE commands. When an INIT_WRITE command occurs, the CFM is waking up. It decodes the configuration identifier and checks if the configuration identifier is a valid one. If not, an interruption is sent through the IT manager and the data flow is stopped, else the CFM waits until the end of the previous configuration (all corresponding data must be processed and carried out of the IP). Then, the new configuration is loaded and the computation is launched. E. Conclusion III. INTEGRATION ENVIRONMENT A. Introduction This environment eases the design and integration of IP. It also gives a general framework for the verification and programming of the IP and the whole architecture. Moreover, all the programming aspects done at the design level can be used directly on the real demonstrator. B. HDL Lego core C. SYSTEMC/TLM platform The Transaction Level Modeling (TLM) platform includes a global architecture development framework and the associated programming facilities. It allows both the verification of the global integration and the real application development. The global architecture framework contains the 2 main components to build a NoC-based structure: the nodes (described by an asynchronous event-based model) and the NI. In order to answer to the programming issues, we strongly split the configuration part of each IP from the computation core. Thus, each IP is split into a configuration class and a computation class. The configuration class contains the core configurations and the address map of the different fields through two methods, a decode method that is equivalent to the RWD, and a code method that converts the configurations into packets. This last method is very useful from a programming point of view, because once the configuration class written for an IP, this IP becomes easily programmable, simply with a call to this method, and without a deep understanding of the IP. The IP designer can also add a method to program or configure its IP. In that case, the IP with its associated NI can be a black box for the integration team. The IP core itself can be written in C, C++, SystemC or an HDL language. For the last case, TLM to HDL and HDL to TLM translators are available for co-simulation and only the SystemC configuration class has to be developed.
IV. DEMONSTRATOR DESCRIPTION In order to validate the concepts previously described, we have designed a first NoC-based prototype ASIC dedicated to 4G telecom applications. This architecture contains 23 IP connected to a 20 nodes network (Fig.4 and 5) for a total complexity of 8 Mgates (0.13 ƒÝ CMOS technology from STMicroelectronics). Five of the IP come from different partners:
The integration of these IP proves the ability of our framework to handle with different kind of IP, in particular programmable (CPU with standard protocol) and reconfigurable structures. For example, the convolutional decoder from ITE/Mitsubishi has been integrated in our NoC environment in only one week. Test vectors available before the integration have been played through the NoC with complete success.
A typical NI interface, developed for the demonstrator with both synchronization and reconfiguration capabilities, corresponds to about 10 KGates, without the FIFO. The NI has been synthesized, placed and routed in different units to up than 250 MHz frequency (using a 0.13 ƒÝm technology). The NI adds only 2 latency cycles at both the input and output levels. For the AHB subsystem, the cost of the wrapper block is about 20 % of the ARM946 core itself, and less when taking into account the whole subsystem. We developed two compatible versions of the nodes: a synchronous one and an asynchronous one that is very suitable to Globally Asynchronous Locally Synchronous (GALS) implementation [8]. With a 5 inputs*5 outputs node connected to an IP, the total cost for the integration is about 18 kGates for the synchronous version (node + NI) and about 45 KGates for the asynchronous version (node + Async/Sync interfaces + NI). It is possible to build a mixed asynchronous / synchronous NoC architecture according to IP and subsystems complexity: a 100 KGates IP with a synchronous integration, and a 350 KGates IP or subsystems with an asynchronous integration corresponds to 15 % area overhead to manage QoS communications, flow synchronization, reconfiguration, IT management, test and debug aspects. VI. CONCLUSION To success in the design of very complex multi-applications SoC, new high-performance communication structures coupled to efficient design and programming methodologies must be set up. In this paper, we present a solution to make easier IP integration: both architecture environment and integration process are described. The proposed architecture is organized around a NoC structure to support communications purposes; this modular backbone brings scalability at the architecture level and flexibility at the application level. A complete development framework based on TLM methodology was used to help in IP integration and verification, and also to program multi-applications. A first step toward fully dynamically reconfigurable platform has been performed thanks to synchronization and reconfiguration mechanisms. Finally, a complete chip prototype has been designed to prove the efficiency of the proposed approach and leads to a prototyping platform for multiple telecommunication applications.
VII. ACKNOWLEDGMENTS VIII. REFERENCES [3] J. Henkel, W. Wolf, S. Chakradhar, ¡§On-chip networks: a scalable, communication-centric embedded system design paradigm¡¨, in 17th International Conference on VLSI Design, 2004, pages 845 ¡V 851. [4] E. Rijpkema, K. Goossens, A. Radulescu, J. Dielissen, J. van Meerbergen, P. Wielage, and E. Waterlander, ¡§Trade offs in the design of a router with both guaranteed and best-effort services for networks on chip¡¨, in DATE, 2003, pages 350-355. [5] R. Lemaire, F.Clermidy, Y. Durand, D. Lattard and A. Jerraya ¡§Performance Evaluation of a NoC-Based Design for MC-CDMA Telecommunications using NS-2¡¨, in RSP¡¦05 Intl Conference, 2005 [6] ¡§SystemC 2.0.1 Language Reference Manual¡¨, Open SystemC Initiative, http://www.systemc.org. [7] A. Clouard et al., ¡§Using Transaction-Level Models in a SoC Design Flow¡¨, in ¡§SystemC: Methodologies and Applications¡¨, edited by W. Muller, W. Rosenstiel, J. Ruf, Kluwer Academic Publishers, 2003, pp. 29-63. [8] E. Beigne, F.Clermidy, P. Vivet, M. Renaudin, A. Clouard, ¡§An Asynchronous NOC Architecture Providing Low Latency Service and its Multi-level Design Framework¡¨, in ASYNC¡¦05 Int¡¦l Conference, 2005 |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |