Networks on Chip - Challenges and Solutions

Partha Pratim Pande, Cristian Grecu, André Ivanov, Res Saleh
SOC Research Lab -- Department of Electrical and Computer Engineering
University of British Columbia
2356 Main Mall Vancouver, BC, V6T 1Z4 Canada
Email: {parthap, grecuc, ivanov, res} @ece.ubc.ca

The University of British Columbia’s System-on-a-Chip (SoC) Research lab has established itself as a world-class research centre for the design, verification and testing of high-speed mixed-signal system on chip projects. The Network on Chip project is one of the most ambitious projects undertaken by this group.

The network-on-Chip (NoC) design paradigm is viewed as an enabling solution for the integration of an exceedingly high number of computational and storage blocks in a single chip. The practical implementation and adoption of the NoC design paradigm is faced with various unsolved issues related to design methodologies, test strategies, and dedicated CAD tools. As a result of the increasing degree of integration, several research groups are striving to develop efficient on-chip communication infrastructures. Today, there exist many SoC designs that contain multiple processors in applications such as set-top boxes, wireless base stations, HDTV, mobile handsets, and image processing. New trends in the design of communication architectures in multi-core SoCs have appeared in the research literature recently. In particular, researchers suggest that multi-core SoCs can be built around different regular interconnect structures originating from parallel computing architectures. Custom-built application specific interconnect architectures are another promising solution. Such communication-centric interconnect fabrics are characterized by different trade-offs with regards to latency, throughput, reliability, energy dissipation, and silicon area requirements. The nature of the application will dictate the selection of a specific template for the communication medium. The figure below shows a representative set of interconnect templates proposed by different research groups.

A complex SoC can be viewed as a micronetwork of multiple blocks, and hence, models and techniques from networking and parallel processing can be borrowed and applied to an SoC design methodology. The micronetwork must ensure quality of service requirements (such as reliability, guaranteed bandwidth/latency), and energy efficiency, under the limitation of intrinsically unreliable signal transmission media. Such limitations are due to the increased likelihood of timing and data errors, the variability of process parameters, crosstalk, and environmental factors such as electro-magnetic interference (EMI) and soft errors.

Current simulation methods and tools can be ported to networked SoCs to validate functionality and performance at various abstraction levels, ranging from the electrical to the transaction levels. NoC libraries, including switches/routers, links and interfaces will provide designers with flexible components to complement processor/storage cores. Nevertheless, the usefulness of such libraries to designers will depend heavily on the level of maturity of the corresponding synthesis/optimization tools and flows. In other words, micro-network synthesis will enable NoC/SoC design similarly to the way logic synthesis enabled efficient semicustom design possible in the eighties.

Though the design process of NoC-based systems borrows some of its aspects from the parallel computing domain, it is driven by a significantly different set of constraints. From the performance perspective, high throughput and low latency are desirable characteristics of MP-SoC platforms. However, from a VLSI design perspective, the energy dissipation profile of the interconnect architectures is of prime importance as the latter can represent a significant portion of the overall energy budget. The silicon area overhead due to the interconnect fabric is important too. The common characteristic of these kinds of architectures is such that the processor/storage cores communicate with each other through high-performance links and intelligent switches and such that the communication design can be represented at a high abstraction level.

The exchange of data among the processor/storage cores is becoming an increasingly difficult task with growing system size and non-scalable global wire delay. To cope with these issues, the end-to-end communication medium needs to be divided into multiple pipelined stages, with delay in each stage comparable with the clock-cycle budget. In NoC architectures, the inter-switch wire segments together with the switch blocks constitute a highly-pipelined communication medium characterized by link pipelining, deeply-pipelined switches, and latency-insensitive component design.

Any new design methodology can only be widely adopted only if it is complemented by efficient test mechanisms and methodologies. The development of test infrastructures and techniques supporting the Network on Chip design paradigm is a challenging problem. Specifically, the design of specialized Test Access Mechanisms (TAMs) for distributing test vectors and novel Design for Testability (DFT) schemes are of major importance. Moreover, in a communication-centric design environment like that provided by the NoCs, fault tolerance and reliability of the data transmission medium are two significant requirements in safety-critical applications.

The test strategy of NoC-based systems must address three problems, (i) testing of the functional/storage blocks and their corresponding network interfaces, (ii) testing of the interconnect infrastructure itself; and (iii) the testing of the integrated system. For testing the functional/storage blocks and their corresponding network interfaces a Test Access Mechanism (TAM) is needed to transport the test data. Such TAM provides on-chip transport of test stimuli from a test pattern source to the core under test. It also transmits test responses from the core under test to test pattern sink. The principal advantage of using NoCs as TAMs is resulting “reuse” of the existing resource and the availability of several parallel paths to transmit test data to each core. Therefore, reduction in system test time can be achieved through extensive use of test parallelization, i.e., more functional blocks can be tested in parallel as more test paths are available.

The controlability/observability of NoC interconnects is relatively reduced, due to the fact that they are deeply embedded and spread across the chip. Pin-count limitations restrict the use of I/O pins dedicated for the test of the different components of the data-transport medium; therefore, the NoC infrastructure should be progressively used for testing its own components in a recursive manner, i.e., the good, already tested NoC components should be used to transport test patterns to the untested elements. This test strategy minimizes the use of additional mechanisms for transporting data to the NoC elements under test, while allowing reduction of test time through the use of parallel test paths and test data multicast.

Testing of the functional/storage blocks and the interconnect infrastructure separately are not sufficient to ensure adequate test quality. The interaction between the functional/storage cores and the communication fabric has to undergo extensive functional testing. This functional system testing should encompass testing of I/O functions of each processing elements and the data routing functions.

Many SoCs are used within embedded systems, where reliability is an important figure of merit. At the same time, in deep submicron technologies beyond the 65 nm node, failures of transistors and wires are more likely to happen due to a variety of effects, such as soft (cosmic) errors, crosstalk, process variations, electromigration, and material aging. In general, we can distinguish between transient and permanent failures. Design of reliable SoCs must encompass techniques that address both types of malfunctions. From a reliability point of view, one of the advantages of packetized communication is the possibility of incorporating error control information into the transmitted data stream. Effective error detection and correction methods borrowed from the fault-tolerant computing and communications engineering domains can be applied to cope with uncertainty in on-chip data transmission. Such methods need to be evaluated and optimized in terms of area, delay and power trade-offs. Permanent failures may be due to material aging (e.g., oxide), electromigration and mechanical/thermal stress. Failures can incapacitate a processing/storage core and/or a communication link. Different fault-tolerant multiprocessor architectures and routing algorithms have been proposed in the parallel processing domain. Some of these can be adapted to the NoC domain, but their effectiveness needs to be evaluated in terms of defect/error coverage versus throughput, delay, energy dissipation and silicon area overhead metrics.

Network interfacing: The success of the NoC design paradigm relies greatly on the standardization of the interfaces between IP cores and the interconnection fabric. Using a standard interface should not impact the methodologies for IP core development. In fact, IP cores wrapped with a standard interface will exhibit a higher reusability and greatly simplify the task of system integration . The Open Core Protocol (OCP) is a plug and play interface standard receiving a wide industrial and academic acceptance. As shown in the figure below, for a core having both master and slave interfaces, the OCP compliant signals of the functional IP blocks are packetized by a second interface. The network interface has two functions:

injecting/absorbing the flits leaving/arriving at the functional/storage blocks;
packetizing/depacketizing the signals coming from/reaching to OCP compatible cores in form of messages/flits.

Interfacing of IP cores with the network fabric

All OCP signals are unidirectional and synchronous, simplifying core implementation, integration and timing analysis. The OCP defines a point-to-point interface between two communicating entities, such as the IP core and the communication medium. One entity acts as the master of the OCP instance, and the other as the slave. OCP unifies all inter-core communications, including dataflow, sideband control and test-specific signals.

The state of the art has reached the point where commercial designs are readily integrating in the range of 10-100 embedded functional/storage blocks in a single SoC. This range is expected to increase significantly in the near future. As a result of this enormous degree of integration, several industrial and academic research groups are striving to develop efficient communication architectures, in some cases specifically optimized for specific applications. There is a converging trend within the research community towards an agreement that Networks on Chip constitute an enabling solution for this level of integration.