|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A Performance Architecture Exploration and Analysis Platform for Memory Sub-systemsVidhya Thyagarajan, Kartik Kariya, Sreeja Menon Abstract : The memory-subsystem includes a memory device such as DRAM, memory controller and physical/IO layer (PHY). There are several parameters that affect the performance of the memory subsystem, including DRAM timing parameters such as read latency, read-write turn-around delays, low-power state exit and entry latencies, Memory Controller resources and features such as queue depth and organization, reordering policies and Memory PHY layer properties such as transmit and receive latencies. In this paper, we outline the various parameters that affect the memory sub-system performance and also introduce the Sensitivity Analysis and Feature Exploration methodologies to analyze the degree of impact of each of these parameters. This platform, when used at an early architectural exploration phase, provides valuable feedback to the memory device, controller and PHY architects to focus on optimizing the most critical parameters. We present a case-study to analyze a next generation mobile DRAM based memory sub-system using our proposed performance architectural exploration platform, and provide a ranking metric for all the parameters that affect the memory sub-system performance for key mobile applications. 1. INTRODUCTION Figure 1 shows an example mobile phone system including the memory sub-system. The memory sub-system consists of a memory controller, a physical/IO layer (PHY) and a memory device, such as DRAM or Flash. The memory sub-system performance is one of the key factors affecting the overall system performance [1].
Performance analysis can be done at various stages such as after hardware and software is integrated during bring up. The importance of an early performance analysis such as at the chip level verification stage or at the memory controller tuning stage has been highlighted under memory and non-memory context by various studies [5] [6]. However, even this is typically too late to make fundamental architectural changes to the design. In this paper, we propose a platform and methodology for integrating performance analysis with early architectural exploration. This can generate valuable feedback to various logic and circuit architects to focus the optimization efforts with high return on investment. In this paper we make the following contributions:
The rest of the paper is organized as follows: Section 2 describes the various parameters that affect the memory subsystem performance 2. MEMORY SUB-SYSTEM PERFORMANCE PARAMETERES 2.1 Overview The impact of the various system parameters on performance and power of the memory sub system are studied for architectural exploration. This section defines the lists of performance parameters which are evaluated during early architecture development phase. Table 1 provides the comprehensive list of memory controller parameters recommended for performing the architectural exploration. These parameters impact performance factors such as bandwidth, latency and power of the memory sub system and thus provide key feedback in the architecture phase for possible optimizations and feature enhancements. Fig 2: Mobile Memory Controller Architecture 2.3 PHY and DRAM Performance Parameters As depicted in Figure 1, in addition to the memory controller, memory PHY and DRAM are the other key components of the memory sub system. The architecture exploration aims at providing important data on the effect of the PHY and DRAM parameters on the various performance and power aspects of the system. The data obtained from the studies can be used to optimize important parameters during PHY architecture and also helps to select appropriate DRAM device. Table 2 provides the list of PHY and DRAM parameters recommended in this paper for performance evaluation. Some of the parameters under study are a net result of a number of sub parameters and hence studying the impact of one parameter would give feedback on performance impact of a number of sub-parameters. 3. EXPERIMENTAL SETUP FOR ARCHITECTURE EXPLORATION A generic architecture exploration platform, to study the various performance parameters of the memory sub-system is shown in Figure 3. A low power mobile memory subsystem is used as a case study to demonstrate the performance evaluation methodology.
Table 1 : Memory Controller Performance Parameters
*tCAC is the time from RD command to RD data for DRAM Table 2: PHY and DRAM Performance Parameters A SystemC model of a multi-port memory controller interfacing with a PHY and low power DRAM (mobile platform memory) is used as the core of the set up. The memory sub-system model implements all the functionalities as described in Section 2. Traffic generators initiate test patterns to the multi-port memory controller compliant with the user interface protocol such as AXI/AHB/OCP supported by the memory controller. A performance analyzer is used to record the transactions and generate the performance and power statistics. Fig 3: Performance Architecture Exploration Set Up 3.1 Application Profiles A smart phone application based traffic profiles are used in the experimental set up. Table 3 depicts the characteristics of the profile with a target system bandwidth of 6.4GB/s. 4. PERFORMANCE ANALYSIS AND RESULT EVALUATION METHODOLOGY This section illustrates our proposed methodology for performing architecture exploration and studying the impact of various system parameters on performance using the platform described in Section 3. Subsequent sections also illustrate the methodology for evaluating the results aiding in providing early architectural feedback using a next generation mobile platform as case study. 4.1 Sensitivity Analysis We introduce Sensitivity Analysis methodology in this paper to study the performance impact of system parameters which can vary within a range of values. The “Sensitivity” of a parameter is defined as the amount of variation of a performance factor (such as bandwidth, power, and latency) for unit variation of the parameter under analysis. The parameters are varied through a range of values and the impact on performance in terms of sensitivity is studied. Sensitivity analysis provides following advantages to architectural phase.
Table 3: Traffic Profile in Experimental Set Up
2. Provides a ranking matrix for all the system parameters by measuring the relative impact of each of them on performance (measured through the Sensitivity Metric). This process helps in identifying critical system parameters which have a higher impact on performance and hence are the most suitable candidates for further optimization during architecture/design stages. Sensitivity analysis in our experimental set up is performed using the 3 different application profiles described in Table 3. An example illustration of results and evaluation methodology of the system parameters described in Section 2 is presented below. Sensitivity analysis methodology evaluates the variations of a performance factor (such as bandwidth) as the system parameters are stepped through a range of values. Figure 4 shows an example result in which the bandwidth is measured in MBps for each unit variation (decrement) of R-W Turnaround, W-R Turnaround, Read Latency and Power Down entry/exit latencies, under the heavy load traffic profile. The slope of the plots indicates the impact of each unit of parameter variation on the system performance and thereby serves as sensitivity metric. For example, the slope of the power down exit latency curve is low, indicating that improving the power down exit latencies has a relatively smaller effect on bandwidth, while read latency curve that has a higher slope, indicates that the read latency parameter has a very high impact.
Fig 4: Sensitivity Analysis of Heavy Load Profile Similar analysis was conducted for the Low Load and Medium Load traffic profiles of Table 3 and analysis plots similar to Figure 4 were obtained. The sensitivity (slopes) for various system parameters for the three traffic profiles were plotted on Y-axis in Figure 5. For example, Read Latency curve in Figure 4 indicates that for every 1 unit decrement of Read Latency, Bandwidth improves by 13MBps. Hence, the sensitivity (slope) for Read Latency under Heavy Load is obtained as 13MBps per unit decrement of latency value which is plotted in Figure 5. The process of plotting slope of each of the system parameter was performed and a consolidated ranking metric as shown in Figure 5 was obtained. This graph thus provides information on the relative impact of various parameters on the system performance. Some of the conclusions which can be derived from Figure 5 are as follows. 1. Read Latency has the maximum impact on bandwidth among the parameters studied. 2. As the traffic load decreases, the effect of read latency increases. 3. As the traffic load increases, the effect of parameters such as R-W turnaround and W-R turnaround increases. 4. As the traffic load decreases, the effect of power down entry/exit latencies increases Fig 5 : Sensitivity Analysis Ranking Metric From the above, an architect who is selecting a DRAM device, and wishes to derive the maximum bandwidth at low load conditions, can for example, select a device which has a smaller read latency as opposed to one which has a smaller power down entry/exit latency.
4.2 Feature Exploration In this section, the methodology to perform feature exploration by evaluating the impact of enabling a feature on performance factors such as bandwidth, latency and power is described. This analysis methodology helps in performing feature exploration during architectural stage and to select inclusion or exclusion of features of the memory sub-system as well as selection of parameter values associated with the features to achieve better performance. For studying the feature exploration, the Heavy Load traffic profile described in Table 3 is used. The feature exploration for studying the impact of memory controller queue depth as an example is depicted in Figure 6 and Figure 7. Figure 6 indicates that the actual bandwidth saturates beyond queue depth value of 8 and Figure 7 indicates that increase in queue depth causes increase in latency. As a result, the memory controller architect could decide to set the memory controller queue depth to 8 for optimum performance.
Figure 6 : Effect of Queue Depth on Bandwidth With the queue depth set to 8, the feature exploration can be continued further for studying the impact of other system parameters. Figure 8 illustrates an example result of impact of enabling High Priority Request feature of the memory controller. By enabling this feature, the traffic generator can assign selected requests as “high priority” which in turn are processed in priority over normal low priority requests. Figure 8 indicates that by enabling the high priority request processing, the average latency of the high priority requests get reduced to 60 clocks cycles and maximum latency to 125. In turn, compared to the latency results for queue depth as 8 in Figure 6 (where High Priority Request processing is not enabled), the maximum latency of normal requests increases from 320 clocks in Figure 6 to 370 clocks in Figure 8 in order to process priority requests sooner. The average latency also increases from 120 to 140 clocks for normal requests. Similar feature exploration analysis can be performed for all the system parameters described in Section2 to obtain information regarding the performance impact. 5 CONCLUSION In this paper, we highlighted the importance of performance analysis at the architectural exploration phase. We defined the memory sub-system features and parameters that impact the system level performance metrics such as bandwidth, latency and power. Two performance evaluation methodologies namely Sensitivity Analysis and Feature Exploration were introduced in the paper to measure the impact of each performance parameter of interest. A generic performance modeling frame-work that can be used to simulate the variations in the parameters and to apply the two evaluation methodologies was proposed. We presented a case study of the architectural exploration of a next generation mobile-DRAM based system. Example results showing the application of our sensitivity analysis and feature exploration methodologies to sample mobile application profiles were demonstrated. We were able to rank and quantify the impact of various parameters on the system performance using the performance modeling framework and evaluation methodologies which in turn serve as priority guidelines for architectural optimizations. Fig 7 : Effect of Queue Depth on Latency
Fig 8 : Effect of enabling High Priority Request 6 REFERENCES [1] N Woo. High Performance SOC for Mobile Applications, IEEE Asian Solid-State Circuits Conference 2010 / Beijing, China [2] S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In ISCA-27, 2000. [3] R Iyer, L Zhao, F Guo, R Illikkal, S Makineni, D Newell, Y Solihin, L Hsu and S Reinhardt. QoS Policies and Architecture for Cache/Memory in CMP Platforms, SIGMETRICS ’07 [4] H Lee and E Chung. Scalable QoS-Aware Memory Controller for High-Bandwidth Packet Memory, IEEE Transactions on VLSI Systems, , MARCH 2008 [5] S Swaminathan, K. Yogendhar, V Thyagarajan. Re-usable Performance Verification of Interconnect IP Designs, DVCON 2007 [6] David Tawei Wang, “Modern DRAM memory systems: Performance analysis and a high performance, power-constrained DRAM scheduling algorithm”, Doctor of Philosophy Dissertation, University of Maryland, 2005
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |