icyflex: an ultra-low power DSP core for portable applications
By Marc MORGAN, Simon GRAY, Jean-Luc NAGEL, CSEM SA
Neuchâtel, Switzerland
Abstract:
The icyflex family of ultra low power 16/32-bit RISC processor cores developed by CSEM offers a flexible architecture that allows for different com-binations of control and DSP functionality. These processors target applications requiring long battery life at the same time as on-chip processing power. Three silicon-proven icyflex cores are available, consuming as little as 6 μW/MHz.
WHO NEEDS ANOTHER PROCESSOR CORE?
Portable products, whether for medical, consumer, industrial or home automation, have an increasing need for miniaturized, low voltage and low power consumption electronics. Increasing performance, connectivity and miniaturization impose conflicting demands on ASIC designers, often leading to in-convenient trade-offs. While small low power mi-cro-controllers are readily available, they often lack the data processing power required by modern System-on-Chip (SoC) solutions.
The icyflex family from CSEM addresses this grow-ing need with a 16/32-bit processor core offering best-in-class processing performance versus power consumption in a flexible core capable of both control and digital signal processing.
This paper will describe the architecture and design choices employed to achieve this unique family of processors, and will review results from the first products designed using these processors.
THE icyflex PROCESSOR FAMILY
CSEM has been a pioneer in the field of low power, low voltage processors, from the original watch processors to the CoolRISC core and the MACGIC DSP. The icyflex architecture was developed as a flexible processor with both DSP and control-type capabilities and C-compiler support, with a best-in-class power budget. A high level of flexibility allows the architecture to be optimized for the application, ranging from simple control through to highly parallel audio/video signal processing applications. The result is a family of high performance processors with best-in-class energy efficiency.
Three variants are so far available:
- icyflex1 – a 16/32-bit RISC processor [1] with a good level of parallelism for a mix of control and DSP-type applications, such as wireless sensor networks requiring local signal processing,
- icyflex2 – a smaller 16/32-bit RISC pro-cessor optimized for control type applica-tions with power consumption as low as 6 μW/MHz in 65 nm LP CMOS,
- icyflex4 – a scalable architecture capable of some control and more importantly of-fering massive parallelism for computa-tion-intensive DSP-type applications such as audio or video processing.
All three processors are designed as VHDL soft blocks with multiple customization parameters (bus widths, stack size, optional blocks) so that only the part of the processor useful for the application is integrated. The processors can be configured at run time to add new addressing modes and new instruc-tions to reduce the number of cycles for individual algorithms. The processors feature powerful data paths – up to 36 multiply-and-accumulate (MAC) units in parallel – and high bandwidth busses to registers and memory for maximum throughput per instruction, or clock cycle. The processors are de-signed for testability and On-Chip-Debug (OCD) support through a JTAG interface.
THE icyflex1 PROCESSOR
The icyflex1 processor is the first ultra-low power processor designed at CSEM supporting both DSP features and a C compiler. The architecture offers maximum flexibility by combining features of a digital signal processor and a micro-controller unit core.
The icyflex1 architecture includes many features to reduce power consumption. The processor can be customized prior to integration in a SoC to shrink or remove parts which are not needed for a specific application: data memory bus width, address word size, data processing and address computation hardware. The load/store RISC architecture reduces the amount of memory fetches. The complex ad-dressing modes reduce address computation cycles when accessing variables and arrays of many data types. The 32-bit wide instruction word avoids the excessive power consumption of very large instruc-tion word (VLIW) processors; it can encode one long or two short instructions which can in turn be executed in parallel. This feature reduces code size, cycle counts and power consumption.
The icyflex1 is reconfigurable at run-time to further optimize the instruction set and addressing modes for specific algorithms. The programmer can select new combinations of the computational units in the processor to create new instructions in the instruc-tion set. This reduces the amount of instructions in critical routines thereby also optimizing code size, clock cycles and power consumption.
The two 32x32 MAC units optimize performance for DSP algorithms (e.g. digital filters, data correla-tion, FFT). The short 3-level pipeline wastes less energy when it is flushed. The 2 independent data busses guarantee sufficient bandwidth to the data memory to avoid stalling the datapath. Zero over-head loop/repeat instructions reduce the number of cycles when executing loops. These are all features which reduce the power consumption.
The icyflex1 architecture supports many features common to the other icyflex processors: condi-tional execution, 8 vectorized interrupt request levels, support for OCD and a wide range of addressing modes to support both a C compiler and specific DSP type operations. Additionally, it also includes hardware commonly found in DSP processors: hardware loops, 2 address generation units and 2 data busses to memory, as well as specific complex addressing modes.
The icyflex1 processor designed at CSEM offers a customizable and reconfigurable soft-IP solution. It combines the benefits of the support for high level languages and a high performance (both memory throughput and computing power) for DSP applica-tions. Its best-in-class energy consumption makes it especially well suited for portable systems where extending battery life plays a key role.
THE icyflex2 PROCESSOR
The icyflex2 processor is derived from the icyflex1 architecture and further optimized for control type applications and for use with a C compiler. The instruction set of the icyflex2 core and its general purpose register bank, where each 32-bit register can also be used as two independent 16-bit regis-ters, makes this architecture very efficient for pro-cessing both 16- or 32-bit data.
The instruction bundling mechanism of the icyflex1 (which supports a single long or two short opera-tions in a 32-bit instruction word) has been extend-ed to support the two short operations to be execut-ed in sequence or in parallel. Executing several short operations in series decreases the number of accesses to the program memory and reduces the total amount of program memory, both contributing to lowering the overall power consumption. Executing in parallel two short operations per bundle leads alternatively to in-creased computing and energy performances.
The pipeline is 5-stage deep and reaches either relatively high frequencies (typically between 100 and 200 MHz) at nominal voltage or operates at very low voltage (typically below 0.9 V) at lower frequencies. Data dependency check and pipeline stalling is performed in hardware.
The icyflex2 architecture supports many features also found on the other icyflex processors: condi-tional execution, 8 vectorized interrupt request levels, direct and indexed addressing modes with offset and pre-/post-increment and decrement, a 32x32 bit multiplier (with 32-bit results), and on-chip-debug.
Like the icyflex4, the RTL description supports either a latch-based synthesis targeting lower oper-ating frequencies for lower power consumptions, or a flip-flop-based synthesis achieving maximum frequency at the cost of slightly higher power con-sumption.
The instruction set was tailored for the C compiler, which thus achieves excellent performance thanks, for instance, to the use of short instructions and of parallelism. Applications developed for this archi-tecture demonstrate remarkably small program memory footprints.
The icyflex2 core targets controllers in battery op-erated applications, such as portable medical and wireless sensors. It can also be used as an ultra low-power controller, in a larger SoC, e.g. for handling power management, where it presents the ad-vantages of being able to run at very low supply voltage for moderate frequencies. The icyflex2 core has also been included in multiprocessor SoCs (MPSoC) in conjunction with several icyflex4 cores. In this case, the icyflex2 typically manages program deployment and run and sleep times of the larger DSP cores.
THE icyflex4 PROCESSOR
The icyflex4 DSP core has been optimized to oper-ate on 16- and 32-bit real or complex data types. Like the icyflex2, it too is built around a longer pipeline (5- to 8-stage fully exposed pipeline for the icyflex4) so as to support higher clock frequencies.
An operation bundling mechanism similar to the one used in the icyflex2 processor is present in the icyflex4, enabling the compaction of up to three independent operations in a single 64-bit instruc-tion. The operations within a bundle can be execut-ed either in sequence or in parallel (or a combina-tion). Additionally, since pipelined architectures require the insertion of no-operations (NOPs) when unrolling cannot hide the latency of the pipeline, NOPs have been optimized in icyflex4 with dedi-cated instruction bits within an instruction bundle to optimize the size of the program memory.
This DSP architecture contains a scalable vector processing unit (VPU). This VPU is organized in “slices” containing registers and data processing elements which are used by single instruction multi-ple data (SIMD) vector operations. The perfor-mance of the VPU can be optimized at synthesis time based on the application requirements by selecting a number of vector processing slices (VPS) from 0 (scalar unit only) to 8, the bandwidth of the memory busses being simultaneously scaled be-tween 64 bits and 512 bits to feed the VPU slices with sufficient data.
The data move unit (DMU) contains two address generation units which drive two dedicated data busses. These two addressing units each support extended addressing modes (e.g. reverse-carry ad-dressing typically used in FFT computation, or modulo addressing with start and end indexes) in addition to the more standard indexed modes with pre- or post-modification, used by a C compiler.
Using the same principles of the icyflex1, the DMU and VPU are reconfigurable at run time to allow the programmer to introduce new instructions and addressing modes to create new combinations of the computational units in the processor. This reduces the amount of instructions in critical routines thereby optimizing code size, clock cycles and power consumption.
DEVELOPMENT TOOLS
The following development tools are available to support each of the icyflex cores:
- Software development kit (SDK): world-class GNU tool suite including a C com-piler (gcc), binary utilities (assembler, dis-assembler, linker, etc.) and a debugger (gdb). Beside these tools, CSEM provides a cycle-accurate instruction set simulator (ISS), which can be used as a stand-alone tool, as a gdb target, or linked to other en-vironments such as ModelSim™ or Matlab™. A plug-in for the Eclipse inte-grated development environment (IDE) is also included in the SDK to offer the pro-grammer a popular graphical interface. Standard libraries from Newlib (libc and libm) are also supported.
- Hardware development kit (HDK): a plug-and-play motherboard to help developers quickly set up a modular development and demonstration environment with multiple daughter boards to connect an icyflex-based SoC and other digital and analog components. A second HDK offers a solution to validate a complete digital system on an FPGA thereby allowing full verification of the system before its integration in an ASIC and providing an efficient software development environment to reduce both costs and time to market.
KEY NUMBERS FOR THE icyflex FAMILY
The following table gives a quick summary of the features of the icyflex processors and estimates of the size and maximum frequency for specific CMOS processes.
Table 1: Key figures for the icyflex family
The average power consumption of the icyflex4 is given for a customization of VPS=2 VPU slices. The two values are respectively for an icyflex4 executing only control type C code and for a radix 4 64-point FFT optimized in assembly code with reconfigured instructions.
VALIDATION EXAMPLES
The three existing members of the icyflex family have been validated in silicon: icyflex1 has been validated in TSMC 180 nm and Tower 180 nm; icyflex2 has been validated in TSMC 180 nm and icyflex4 in TSMC 65 nm CMOS processes. Further circuits are in preparation.
While customer designs cannot be described for confidentiality reasons, CSEM has implemented icyflex cores in a number of internal SoC designs. For example:
- “icycom” is an RF SoC [2] comprising an ultra low power RF transceiver front-end, on-chip DSP and control, along with digital interfaces and sophisticated power management options. Some of the target applications for icycom are wireless sensor networks (WSN) and wireless body area networks (WBAN). Integration of an on-chip DSP has long been a target for design engineers in order to carry out signal processing on chip to enable data reduction and compression, and thus minimize the data transfer requirements for the more power-hungry radio transmission. The availability of the icyflex1 processor allows for the first time such a DSP integration, offering important advantages to the power budget of the overall system. See figure 1.
Fig 1: icycom: an ultra low power RF SoC with integrated icyflex1 processor
- “icycam” is a vision sensor SoC [3] combining a high dynamic range image sensor array with on-board signal processing, 128 KiB SRAM and a wide range of digital peripherals. In this circuit, the icyflex1 runs at 50 MHz to allow real-time execution of complex image processing algorithms. The result is a fully-integrated low-power vision SoC suitable for a wide range of machine vision applications like automotive, security and metrology. See figure 2.
APPLICATIONS
The icyflex family of processors is targeted at a growing range of applications requiring both low power operation (e.g. portable, battery-operated devices) as well as significant processing power. For example:
- Wireless sensor networks (WSN) need low power consumption to run from a small battery with a long autonomy (months or years) or from an energy harvesting source, while at the same time they need local processing of sensor signals to reduce data transmission bandwidth.
- Wireless body area networks (WBAN) depend on miniaturized body-worn sensors to provide, for instance, real-time health monitoring and new human-machine interfaces among other applications.
- Medical implants drive new requirements for long autonomy and miniaturization, both parameters served by the icyflex processors.
- Digital hearing aids need intensive audio data processing while maintaining minimum 24-hour coin cell battery lifetime.
All of these applications can benefit from the ultra-low power consumption and the processing features of the icyflex family of processors.
Fig 2: icycam: a high-dynamic range vision sensor SoC with integrated icyflex1 processor
AVAILABILITY
All icyflex processor cores are available either as soft IP cores under license, or as part of a low-power SoC design at CSEM.
PERSPECTIVES
The icyflex family sets a new benchmark for high performance low power processors, demonstrating that processing power does not need to be sacrificed to obtain a low power ASIC design.
ABOUT CSEM – AN INNOVATION CENTER
CSEM, Centre Suisse d’Electronique et de Micro-technique SA (Swiss Center for Electronics and Microtechnology), founded in 1984, is a private research and development center specializing in microtechnology, nanotechnology, microelectron-ics, system engineering and communications tech-nologies. It offers its customers and industry part-ners tailor-made innovative solutions based on its knowledge of the market and technological exper-tise derived from applied research. Having founded several start-ups, it contributes to developing Swit-zerland as an industrial location. To date, a total of 29 such enterprises, with more than 500 employees, have been launched by CSEM.
Approximately 400 highly qualified and specialized employees from various scientific and technical disciplines work for CSEM in Neuchâtel, Zurich, Basel, Alpnach and Landquart. They represent more than 30 nationalities and constitute the basis of the company’s creativity, dynamism and innovation potential.
Further information is available at www.csem.ch.
REFERENCES
[1] C. Arm et al., “Low-Power 32-Bit Dual-MAC 120 μW/MHz 1.0 V icyflex DSP/MCU Core”, ESSCIRC Dig. Tech. Papers, Edinburgh, Sept. 2008, 190-193
[2] E. Leroux et al, “A 1 V RF SoC with an 863-928 MHz 400 kbit/s Radio and a 32-b dual-MAC DSP Core for Wireless Sensor and Body Networks”, ISSCC Dig. Tech. Papers, Feb. 2010
[3] P.-F. Rüedi et al., “An SoC combining a 132 dB QVGA pixel array and a 32 b DSP/MCU processor for vision applications”, ISSCC Dig. Tech. Papers, Feb. 2009, 15-16
|
CSEM Hot IP
Related Articles
- A RISC-V ISA Extension For Ultra-Low Power IoT Wireless Signal Processing
- Think Big for Ultra-Low Power IoT SoCs
- Leverage always-on voice trigger IP to reach ultra-low power consumption in voice-controlled devices
- Utilizing UWB in ultra-low power ZigBee wireless sensor nodes
- DSPs with PCI Express interface extend connectivity while improving performance and power efficiency
New Articles
Most Popular
E-mail This Article | Printer-Friendly Page |