|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
Design of Wireless Systems Utilizing Scratchpad Memories
by Goran Panic, Zoran Stamenkovic, Klaus Tittelbach-Helmrich, Jens Lehmann, Gunter Schoof
IHP GmbH Im Technologiepark 25 D-15236 Frankfurt (Oder), Germany Abstract: In this paper we study the use of scratchpad memories in low-power embedded systems for wireless applications. As an example, we designed a general-purpose microcontroller-based system-onchip that is used as a part of an evaluation platform aimed for hardware and software development of wireless systems. The system is constructed around the AMBA platform and is based on the MIPS 4KEp embedded core that includes scratchpad memories for instructions and data. The paper describes the scratchpad system integration, implementation and verification methodology, and also depicts the system performance at the evaluation platform for wireless applications. 1. INTRODUCTION In the last decade, a tremendous growth has been made in the area of wireless networking and communications, where the power consumption is acknowledged to be a major merit for the overall system quality. Therefore, the power optimization techniques have become the most important design issue, and are applied at all stages of system design. At IHP [1] we develop embedded systems for wireless applications that usually consist of a number of cores optimized for specific processing tasks. The systems are being developed around the IHP wireless engine platform (Figure 1) that specifies main parts of a wireless communication system to be integrated on a single chip. The platform presents a multiprocessor system consisting of protocol and application engines interconnected with a baseband processing unit and an analog front-end. The platform specifies a power optimization mechanism that benefits from the interlayer optimization, where the overhead introduced by complex protocol functions is reduced by introduction of dedicated hardware accelerators in the protocol cores. The embedded protocol systems usually include an embedded processor that executes a part of the protocol functionality where the protocol is stored in an embedded on-chip memory. Many embedded CPU cores available on the market, provide support for embedded scratchpad memories (SPRAMs). The SPRAMs are software controlled and have better power efficiency than caches [2]. In order to evaluate the scratchpad suitability for our applications we designed a test system-on-chip based on a MIPS 4KEp [9] core with scratchpad support.
The reminder of the paper is organized as follows: Section 2 discusses the scratchpad functionality, integration options and interface to the core. The architecture of the designed test system is presented in Section 3, as well as the applied implementation and verification methodology. Section 4 describes the system performance measured on an evaluation platform. The conclusion and directions for future work are given in Section 5. 2. SCRATCHPAD MEMORIES The term scratchpad memory usually refers to the embedded on-chip memory used to store instructions or data. Unlike the caches, the scratchpads are software controlled. They are mapped to the processor’s address space and their use is controlled by a user’s application. The main advantages of SPRAMs over caches are: 1) reduced chip area, since there’s no need for additional tag and way select memory arrays and no complex cache hit logic is present, 2) reduced power consumption, since only simple SPRAM hit logic is used, 3) flexibility, since user have full control of SPRAM access, and, 4) performance, since scratchpad access is in general faster than cache access. Disadvantage is in additional effort required to program the system based on SPRAMs. In the case of an application that significantly exceeds an available SPRAM space, the overall performance is affected by the ability of a compiler to efficiently utilize this space. At the moment, no commercial compilers are available that can efficiently perform this task. Some efforts in this direction have been made in [3,4,5]. With no compiler support, it’s up to user to decide which part of the program should be stored in SPRAM, and if and how it would be changed during execution. It is possible with the modern RISC cores to combine both the scratchpads and caches to work concurrently. In that case the user should select an optimal configuration that meets its target system requirements. 2.1. Scratchpad Integration The integration of SPRAMs into a system mainly depends on the specific system’s architecture and the target application. Nowadays, many commercially available RISC CPU cores have an option to include SPRAMs. For example, common configurations for 803x/805x compatible 8-bit microcontrollers allow integration of maximum 256 bytes of internal data SPRAM (DSPRAM) and up to 64 KB of internal ROM (Synopsys dw_8051) [8]. The more advanced 68HC11 compatible 8-bit microcontrollers allow integration of maximum 1024 bytes of DSPRAM and up to 64KB of internal ROM (Synopsys dw_6811) [8]. In the case of highperformance 32-bit RISC processors, both data and instruction scratchpads (ISPRAM) are supported as well as concurrent cache integration. There are many embedded CPU vendors offering a variety of core architectures with scratchpad support that can answer to almost any customer demand. Some of them, offered by ARM [6] and MIPS [7], are listed in Table 1. Note that ARM uses the term TCM (Tightly Coupled Memory) to refer the scratchpads.
Table 1. Memory options for different types of commercially available 32-bit microcontrollers The high-end embedded RISC cores usually provide customizable SPRAM ports that act as simple memory-like interfaces. Usually, there are separate interfaces to data and instruction SPRAM that user needs to customize to fit the target memory device. We will describe the features of SPRAM interface on the example of MIPS 4KEp [9] embedded core but the main ideas are general and can be found in similar forms at other vendor’s cores. 2.2. Scratchpad Interface The SPRAM interface (Figure 2) on a MIPS 4KEp core is designed to provide low-latency access to on-chip memories. The SPRAM port is accessed in parallel with the caches. This saves a number of cycles that would normally be required going through the bus interface unit and the EC interface. The SPRAM array effectively replaces a cache way and is always located at the last cache way. If caches are present in conjunction with SPRAM, then the maximum cache associativity is 3. A full tag array is not needed for SPRAM since, equivalent tag functionality is achieved by a simple decode of the physical address to determine hit or miss. The cache way-select array is not needed for SPRAM.
The 4KEp core provides separate interfaces for ISPRAM and DSPRAM which presence can be independently configured. A SPRAM array can be up to 1MB in size. The ISPRAM can service uncached references, enabling processor boot with no EC interface accesses. The SPRAM port can stall the core if the SPRAM array was busy the previous cycle or if data is not ready. This can enable other sources to access the SPRAM without the need for dual-porting the array. This is useful, for example, if there is a DMA engine filling the SPRAM or if a unified I/D SPRAM is desired. A cache, in contrast, has fixed single-cycle timing. The SPRAM interface is designed to be flexible enough to work with a variety of system designs. A variety of memory devices can be connected to the SPRAM interface: SRAM, ROM, flash, etc. If desired, memory-mapped functions can also be connected, as long as the interface protocol is met. Multi-ported devices can also be used; in this case, the ISPRAM or DSPRAM interface is logically connected to just one of the ports, with other system logic unrelated to the 4KEp core utilizing the other port(s). 3. DESIGN OF A SCRATCHPAD BASED SOC In order to evaluate the scratchpad applicability for our systems we designed a test chip containing an embedded CPU with the scratchpad support. The chip contains no application-specific logic and is the very first step in designing complex wireless systems. 3.1. System Architecture The system architecture is presented in Figure 3. The system is based on MIPS 4KEp core [9] connected through the AMBA bus [10] to system peripherals. The core integrates both instruction and data scratchpad memories with sizes of 8 Kbytes each. A memory controller is attached to the AMBA advanced high performance bus (AHB). It provides an interface to an external Flash memory and static RAMs. The slower AMBA advanced peripheral bus (APB) is attached to AHB via AHBto- APB Bridge. Two UARTs and GPIO are connected to APB.
3.1.1. IP cores. 3.1.2. System Planning. 3.1.3. Processor Core. 3.1.4. AMBA AHB Bus. 3.1.5. AMBA APB Bus. 3.1.6. Memory Controller. 3.1.7. UARTs. 3.1.8. GPIO. 3.2. System Implementation and Verification The used design flow is typical for the design reuse methodology. Considering the defined system architecture, the reusable components are configured and the custom logic is designed and verified as well as the top-design. 3.2.1. Synthesis.
3.2.2. Verification. The testbench uses HDL verification models attached to the external interfaces of the system and behavioral simulation models for SPRAMs and external memories. We have written a generic assembly program to be stored in the simulation model of the boot flash. The program includes processor initialization phase, a boot loader for copying of test programs to ISPRAM and test program itself. After reset is released, a test program is copied to ISPRAM, read from it and executed. A number of test programs were written in order to test as many features of the chip. The tests perform various transactions on each of the system ports and test specific operations of the chip. The same tests are used to verify RTL model of the chip, synthesized gate-level netlist and after-layout netlist. 3.2.3. Layout.
The features of the chip are summarized in Table 2. The data show that the high-performance of the system is reached along with a significant reduction of power consumption. 4. SYSTEM PERFORMANCE To verify our approach, we compare the features of functionally equivalent systems based on scratchpads and caches. Also, the system performance is evaluated at a hardware platform designed for developing and testing of wireless applications and systems. 4.1. Scratchpad-based vs. Cache-based System Prior to hardware analysis of our test system we performed theoretical analysis and comparison between two processor’s configurations, one having scratchpad memories and another having directmapped caches of same size. The estimated area and power consumption of implemented processor configurations are compared in Table 3. Table 3. Features of the processor configurations
As it was expected, the core having caches has significant increase in logic area compared to the SPRAM-based one. Additionally, the cache-based core includes additional memory arrays for tag and way select fields, making the difference even larger. The power estimation is performed with Synopsys PrimePower tool [14] using custom applications to be executed from the cache, and respectively from the ISPRAM. In most cases the analysis has shown significantly lower power consumption of the SPRAM-based configuration. 4.2. Evaluation Platform In general, the evaluation of SoCs designed at IHP is performed on the hardware platform we have designed for the purpose of system development and testing (Figure 6). The platform consists of one or more mutually interconnected subsystems. Each subsystem contains the printed circuit board controlled by a workstation containing the socket for a daughter board. The chip under evaluation is assembled on the daughter board (Figure 7). The subsystems are connected via the link emulator board containing a FPGA that can be programmed for a specific application. The main board contains SRAM and FLASH memories, voltage and clock regulation circuitry, LED display, connectors to the probes from signal analyzer and EJTAG connector to MIPS hardware debugger.
Additionally, there are two serial and one parallel connector for connection to a PC and the link emulator. The functioning of the complete system is controlled via debug monitor program, which was internally developed. In the case that the chip under evaluation contains some specific interface like PCMCIA, the corresponding connectors can be inserted on a daughter board enabling the workstation to access it.
The fabricated test chip is extensively tested for functionality using our evaluation platform. We developed routines that enable the monitor program on a workstation to efficiently change the content of ISPRAM. A number of applications are executed from the ISPRAM and the performance and power consumption is measured. The measurements show high performance, with significantly lower power consumption, compared to the equivalent cachebased systems we have designed in past. 5. CONCLUSIONS AND FUTURE WORK We have demonstrated a methodology to design systems for wireless applications based on an embedded processor with the scratchpad support. The test chip is designed as well as an evaluation platform in order to evaluate the presented concept. The main advantages of the systems containing embedded SPRAMs over the cache-based are in decreased power consumption and reduced area of the chip. However, an additional effort is required to develop the software that efficiently utilizes the system resources. Using presented methodology we have already designed the prototypes of some protocol systems used in WLAN (IEEE 802.11a MAC, TCP/IP, encryption cores, etc.). Using the evaluation platform we were able to connect two subsystems to communicate over the link emulator. The system is shown to be very useful for software and hardware debugging and testing of newly added features. In the future we plan to make systems with a higher level of integration where more subsystems and additional features should be integrated on a single chip. For the systems running at low frequency, like sensor networks nodes, we plan to insert FLASHbased SPRAM that has much better area utilization than SRAM. 6. REFERENCES [1] IHP GmbH, (http://www.ihpmicroelectronics.com) [2] R. Banakar, S. Steinke, Bo-S. Lee, M. Balakrishnan, and P. Marwedel, Scratchpad Memory: A Design Alternative for Cache On-chip memory in Embedded Systems, CODES, Estes Park (Colorado), May 2002. [3] M. Kandemir, J. Ramanujam, M. J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh, Dynamic Management of Scratch-Pad Memory Space, Proc. of 38th Design Automation Conference, pages 690– 695, Las Vegas, NV, Jun 2001. [4] M. Kandemir, I. M. Vijakrishnan N., and W. Ye. Influence of compiler optimizations on system power, Proc. of the 37th Design Automation Conference, pages 304–307, Los Angeles, CA, Jun 2000. [5] Stefan Steinke, Lars Wehmeyer, Bo-Sik Lee, Peter Marwedel, Assigning Program and Data Objects to Scratchpad for Energy Reduction, DATE 2002, Paris/France, March 2002. [6] ARM Inc [8] Synopsys DesignWare® Library, Synopsys inc. [9] MIPS32TM 4KETM core family, MIPS Technologies [10] AMBA On-Chip Bus Standard, ARM Inc, [11] CoreTools, Synopsys Inc, [12] Design Compiler, Synopsys Inc [13] SoC Encounter, Cadence Inc [14] PrimePower, Synopsys Inc |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |