A Novel Mesh Architecture for On-Chip Networks

Midia Reshadi¹, Ahmad Khademzadeh², Akram Reza¹, and Maryam Bahmani¹

¹ Islamic Azad University, Science and Research Branch, Tehran, Iran.
² Iran Telecommunication Research Center, Tehran, Iran.

Abstract:

2D Mesh is a very popular topology in Network on Chip due to its facilitated implementation, simplicity of the XY routing strategy and the network scalability. On the other hand, 2D Mesh has some disadvantages such as long network diameter as well as energy inefficiency because of the extra hops. Due to this, in this work, we propose a novel NoC topology called Diametrical 2D Mesh and related shortest path routing algorithm called Extended XY. Proposed architecture reduces the network diameter 50 percent compared to 2D mesh. This ends in less network average latency and power consumption considerably, which are demonstrated by performance and cost simulations.

I. INTRODUCTION

Chip multiprocessors are becoming more widely utilized to efficiently use the increasing number of transistors available in a modern VLSI technology. As the number of cores increases in such architectures, an on-chip network is used to connect the cores to provide a communication substrate. These on-chip networks should provide both low latency and high bandwidth communication fabrics that efficiently support a diverse variety of workloads. On-chip interconnection networks have recently attracted considerable research attention [2], much of which has focused on microarchitecture improvements [1, 3] and routing algorithms [4]. However, selecting an appropriate topology is one of the most critical decisions in the design of an interconnection network because it bounds critical performance metrics such as the network’s zero-load latency and its capacity [10], and directly influences implementation costs, both in terms of the on-chip resources and implementation complexity. Most on-chip networks that have been proposed, mostly utilize a 2D Mesh such as the networks found in the RAW processor [6], the TRIPS processor [7], the 80-node Intel’s Teraflops research chip [8], and the 64-node chip multiprocessor from Tilera [9]. Although 2D Mesh networks provide a very simple network and lead to very short wires in the architecture, these networks have several disadvantages including long network diameter as well as energy inefficiency because of the extra hops.

Reducing the diameter of 2D mesh, not only improves the network performance but also reduces the power consumption. We transform the conventional 2D Mesh by utilizing the bypass channels which cause the 50 percent reduction in network diameter. In addition, the proposed topology has been evaluated in both performance and cost point of views. The results illustrate that the network average latency and power consumption are significantly decreased, because diametrical links not only reduce the distance between a source node and a destination node but also alleviate traffic congestion in the network so that the network performance and cost is improved. This paper is organized as follows. We describe a general description of Diametrical 2D Mesh in the section 2. In addition, Diametrical 2D Mesh addressing is presented in section 3, then, proposed routing algorithm is illustrated in section 4. Furthermore, simulation results are displayed in section 5. Finally, we conclude our paper in section 6.

II. DIAMETRICAL 2D MESH

In this section, Diametrical 2D Mesh is presented. This network is a square 2D Mesh which has 8 extra bypass channels (Fig. 1(a) and Fig. 1(b)). Fundamentally, “the NoC topologies can be described by a graph G(N,C) where N is the set of routers, and C is the set of channels between routers” [10-13]. Like 2D Mesh, in Diametrical 2D Mesh every router is attached to one IP core. N in Diametrical 2D Mesh can be considered as the number of nodes which is calculated as “(1)”:

N_Diametrical=i²; i=2,3,4 (1)

Where (i) is the number of nodes in each side of the network. Diametrical 2D Mesh network can be divided into (i-1)2 sub-networks that one of them is shown in Fig.1.a. For example 4×4=42 Diametrical 2D Mesh can be divided into 3×3=32 sub-networks. As it can be seen in Fig.1.a and Fig.1.b, we choose four edge sub-networks to connect to each other with 8 extra links. In order to this, not only we reduce the hop count and, consequently, average latency, but we also increase throughput of Diametrical 2D Mesh in comparison to 2D Mesh. Therefore, C is increased as “(2)”:

Fig. 1(a), Diametrical 2D mesh’s topology with 16 IP cores (b), Diametrical 2D mesh’s topology with 25 IP cores

Diametrical 2D Mesh is a performance efficient topology, because its network diameter is reduced considerably in comparison to 2D Mesh. Although, in this network the area is increased because of 8 extra links, its power consumption is decreased due to average hop count reduction. Number of extra links in Diametrical 2D Mesh do not grow by the growth of IP cores. On the other hand, by growth of IP cores, number of extra links constantly and statistically equals 8. This link redundancy is decreased by the growth of IP cores. For example, the link redundancy in 16 IP cores 2D Mesh is 1/3 and in 25 IP cores 2D Mesh, this ratio is 1/5 . Furthermore, the network diameter is decreased 50% with any number of IP cores in comparison to 2D Mesh.

All in all, the main purpose of adding 8 extra links to 2D Mesh topology is the reduction of diameter in 2D Mesh when 2D Mesh is expanded by large number of IP cores. Moreover, we try to minimize the area and power consumption redundancy by defining constant 8 links that decrease diameter and connect four edge sub-networks.

Fig. 2(a), The XY addressing scheme of the network with 16 IP cores (b), The XY addressing scheme of the network with 25 IP cores

III. DIAMETRICAL 2D MESH ADDRESSING

Diametrical 2D Mesh uses addresses which are composed of two parts; X and Y, which X shows the number of row and Y indicates the number of column that the node is located in (see Fig.2). The number of bits in X and Y parts are determined by the number of rows and columns in Diametrical 2D Mesh. For example, for 16 IP cores, Diametrical 2D Mesh has 4 nodes in each rows and 4 ones in each column, due to this fact both X and Y parts have to use 2 bits. In other words, 2 bits show the number of row, X, and 2 bits indicate the number of column, Y. Also, for 25 IP cores, Diametrical 2D Mesh has 5 nodes in each row and column. Therefore, both X part bits and Y part bits must be three. This means that 6 bits addresses are used to label the Diametrical 2D Mesh with 25 IP cores. Both 16 and 25 Diametrical 2D Mesh addressing are demonstrated in Fig.2.a and Fig.2.b.

IV. DIAMETRICAL 2D MESH ROUTING

“The routing protocol deals with resolution of the routing decision made at every router” [11]. Routing method affects the cost (area and power consumption) and the performance (average latency and throughput) issues in the NoC design.

We propose extended XY routing for Diametrical 2D mesh. Fundamentally, this routing algorithm is the shortest path and inherited from the well-known 2D Mesh XY routing. Similar to XY routing, Extended XY is very simple due to simple addressing scheme and structural topology. Fig.3 shows the details of the routing pseudo code. As can be seen, we define Xoffset and Yoffset values in the pseudo code, which are calculated as (3) and (4) respectively:

Xoffset=Xdest-Xcurrent (3)

Yoffset=Ydest-Ycurrent (4)

Where Xcurrent is the X value of a current node and Xdest is the X value of a destination node. In addition, Ycurrent is the Y value of a current node and Ydest is the Y value of a destination node. Xoffset and Yoffset are the values indicating the number of rows and columns between a current and a destination nodes respectively. If both Xoffset and Yoffset values are zero, it means that the current node is the destination and the packet reaches the destination node. Extended XY routing utilizes conventional XY routing in the following conditions:

When the diameter channel is not used:
1. If a current node and a destination node are located in the same row or column, or Xoffset+Yoffset < d-1, according to these conditions, conventional XY routing decisions are performed (line 10th of pseudo code).
When diameter channel is used:
1. If Xoffset+Yoffset > d-1:

In contrast, diameter link is used as a shortcut to reduce the average hop count of the network.

Fig. 3, The routing pseudo code

V. SIMULATION RESULTS

As we mentioned before, Diametrical 2D Mesh has been proposed to cope with 2D Mesh disadvantages such as large network diameter and high power consumption. In order to our work validation, we evaluate both Diametrical 2D Mesh and conventional 2D Mesh performance indicators (such as average network latency and throughput) and cost indicators (such as power consumption). Therefore, we use Xmulator [16], an event base and object oriented simulator (Fig.4). Xmulator has been developed based on the software engineering designing methods and it was implemented by C#. The multilayered architecture consists of the basic utilities such as: the basic components definition, the interconnection network components and the queuing network. In order to obtain the average network latency and the throughput, we use uniform traffic which is a balanced traffic load to achieve a fair comparison. Table.2 and Table.3 illustrate the details of simulation. Fig.4 and Fig.5 demonstrate the average network latency and the throughput variations respectively. Average network latency is defined as the time (in cycles) that elapses from between the occurrence of a message header injection into the network at the source node and the occurrence of a tail flit reception at the destination node [14]. As can be seen in Fig.4, the Diametrical 2D Mesh’s latency is less than 2D Mesh, on account of less hop count which is caused by diametrical channels. Throughput is defined as the data rate in bits per second that the network accepts per input port [10]. As can be observed in Fig.5, Diametrical 2D Mesh has much throughput in comparison to conventional 2D Mesh.

Furthermore, consumed power of both topologies has been estimated by LUNA tool [15]. LUNA is a framework which takes as input topology, routing protocol and message flows, and drives a power profile of the network. The analysis is based on link utilization as the unit of abstraction for the network power. Figs.6-9 present the power comparisons between Diametrical 2D Mesh and conventional 2D Mesh by expanding the IP cores in the network. Results show that, Diametrical 2D Mesh has less power consumption compared to 2D Mesh, because of less hop count and alleviation of the traffic congestion. Furthermore, the gap between two topologies power consumption curves is significantly increased by the growth of the network. Moreover, Table.1 shows the variation of the network diameter and link count based on the growth of the network. Results indicate that, Diametrical 2D Mesh link redundancy is reduced by enlarging network size, due to the fact that link redundancy is permanently constant in 8 links. On the other hand, Diametrical 2D Mesh network diameter improvement is enduringly 50 percent. These facts illustrate that Diametrical 2D Mesh is more appropriate for large network size, due to less redundancy overhead.

Fig. 4 The simulation toolchain

VI. CONCLUSION

In this paper, we propose a novel topology based on 2D Mesh architecture, called “Diametrical 2D Mesh”, to cope with the conventional 2D Mesh disadvantages such as large network diameter and inefficient energy consumption. In i2 Diametrical 2D mesh, 8 links connect four (i-1)2 2D Meshes to each other. This proposed scheme reduces the network diameter 50% compared to 2D mesh. This leads less average hop count and network average latency and power consumption. In order to validate the Diametrical 2D mesh, simulations are carried out from the performance (network average latency and throughput) and the cost (power consumption) point of views. Results illustrate that although Diametrical 2D Mesh constantly has 8 redundant links, it plays an efficient role in performance and cost fields.

Table 1. interconnection parameters

Fig. 4. Shows the network average latency variation of both Diametrical 2D Mesh and conventional 2D Mesh

Fig. 5. Shows the throughput variation of both Diametrical 2D Mesh and conventional 2D Mesh

Fig. 6. Shows the power consumption variation of both 3×3 Diametrical 2D Mesh and 3×3 conventional 2D Mesh

Fig. 7. Shows the power consumption variation of both 4×4 Diametrical 2D Mesh and 4×4 conventional 2D Mesh

Fig8. Shows the power consumption variation of both 5×5 Diametrical 2D Mesh and 5×5 conventional 2D Mesh

Fig.9. Shows the power consumption variation of both 6×6 Diametrical 2D Mesh and 6×6 conventional 2D Mesh

[1] P. Abad, V. Puente, J. A. Gregorio, and P. Prieto. Rotary Router: An Efficient Architecture for CMP Interconnection Networks. In Proc.of the International Symposium on Computer Architecture (ISCA), pages 116–125, San Diego, CA, June 2007.

[2] T. Bjerregaard and S.Mahadevan. A survey of research and practicesof network-on-chip. ACM Comput. Surv., 38(1):1, 2006.

[3] R. D. Mullins, A. West, and S. W. Moore. Low-Latency Virtual- Channel Routers for On-Chip Networks. In Proc. of the InternationalSymposium on Computer Architecture (ISCA), pages 188–197, Munich,Germany, 2004.

[4] D. Seo, A. Ali, W.-T. Lim, N. Rafique, and M. Thottethodi. Near-Optimal Worst-Case Throughput Routing for Two-DimensionalMesh Networks. In Proc. of the International Symposium on ComputerArchitecture (ISCA), pages 432–443, Madison, WI, 2005.

[5] W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco, CA, 2004.

[6] M. B. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal. Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures.In International Symposium on High-Performance Computer Architecture (HPCA), pages 341–353, Anaheim, California,2003.

[7] P. Gratz, C. Kim, R. McDonald, S. Keckler, and D. Burger. Implementation and Evaluation of On-Chip Network Architectures.In International Conference on Computer Design (ICCD), 2006.

[8] S. Vangal et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS. In IEEE Int’l Solid-State Circuits Conf., Digest of Tech. Papers(ISSCC), 2007.

[9] A. Agarwal, L. Bao, J. Brown, B. Edwards, M. Mattina, C.-C. Miao, C. Ramey, and D. Wentzlaff. Tile Processor: Embedded Multicore for Networking and Multimedia. In Hot Chips 19, Stanford, CA, Aug. 2007.

[10] W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco, USA, 2004.

[11] L. Benini, G. De Micheli, Networks on Chips: Technology and Tools. Morgan Kaufmann, San Francisco, CA, 2006.

[12] J. Duato, S. Yalamanchili, L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann, Los Altos, CA, 2002.

[13] B.Parhami, Introduction to Parallel Processing Algorithms and Architectures, Kluwer Academic PublishersNew York, 2002

[14] P. Pratim Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, “Performance evaluation and design trade-offs for network-on-chip interconnect architectures,” IEEE Trans. Computers, vol. 54, no 8, pp. 1025 – 1040, August, 2005.

[15] N. Eisley, L.-S. Peh, “High-level power analysis for on-chip networks”. in Proc. of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 104–115, September. Available: http://www.princeton.edu/~eisley/LUNA.html

[16] A.Nayebi, S. Meraji, A. Shamaei, H. Sarbazi-Azad, “XMulator: A Listener-Based Integrated Simulation Platform for Interconnection Networks”. In First Asia International Conference on Modelling & Simulation, 2007.Available: http://www.xmulator.org