By Lodewijk T. Smit, Gerard K. Rauwerda, Jochem H. Rutgers, Maciej Portalski and Reinier Kuipers Recore Systems Abstract Increasing complexity, faster changing standards and shorter time to market ask for composing systems out of standard IP components. An example shows the construction of a System-on-Chip (SoC) based on standard IP components for a Digital Audio Broadcasting consumer application.
I. Introduction Several trends are visible in the world of digital standards:
- First, standards are becoming increasingly complex. In digital wireless devices more and more sophisticated signal modulation schemes are used. In multimedia standards better compression algorithms are incorporated. In broadcasting and telecom standards improved error correction codes are applied to achieve better performance. Beside improved quality, most standards also add more and more features which require more sophisti-cated control mechanisms. This results in longer development time and a higher probability of bugs in the design.
- Second, there are more and more coexisting standards offering similar functionality. For example, in Europe alone coexisting wireless telecommunication standards include GSM, GPRS, UMTS, HSDPA and LTE. For reception of digital video content from wireless broadcasts, end users worldwide can choose between DVB-H, DVB-T, DVB-S, T-DMB, S-DMB, MediaFLO, ISDB-T and other standards. Also for multimedia coding all kinds of MPEG flavors coexist. Therefore, flexibility is required in devices to serve different standards. Third, standards are changing faster and faster. New standards emerge and existing standards evolve rapidly.
For example, while GSM was the only mainstream digital wireless communication standard for about two decades, recently different other new standards emerged. To cope with these rapid changes, flexible devices are required. It is desirable that at least the manufacturer of the device has the possibility to adapt rapidly to new situations. Ideally, a consumer can upgrade his own devices with new functionalities.
Also on the hardware side some trends are visible:
- First, designs are becoming more and more complex (more transistors/chip). The difference between the increase in designer productivity and the design complexity is well known as the design gap. This results in longer design times and an increasing number of development cycles, which conflicts with the time to market requirement that becomes shorter and shorter.
- Second, the mask costs for electronic hardware production increases exponentially. With every new technology, the mask costs double. This means that the volume needs to increase to keep chip production cost affordable. However, this heavily conflicts with the rapid evolvements of digital standards and shorter lifetimes of devices.
II. IP building blocks The trends above ask for an approach that makes it possible to rapidly develop complete SoCs for embedded systems. This can be achieved by IP reuse of hardware and software components. According to the ITRS roadmap [1], IP reuse will increase the productivity by 200%. Furthermore, the systems should be flexible enough to cope with various changes. In case of hardware components this can be achieved by utilizing reconfigurable architectures. Reconfigurable architectures offer more flexibility than ASICs, while being more efficient than GPPs.
| Energy | Performance | Flexibility | Total Costs | Time to Market |
Montium | + | + | + | + | + |
ASIC | ++ | ++ | -- | -- | -- |
FPGA | - | + | + | - | + |
DSP | - | - | + | + | + |
GPP | - | - | ++ | ++ | + |
MCU | + | -- | + | ++ | + |
Tabel 1 - Comparison of architectures on relevant design parameters Within Recore we construct tailored Systems-on- Chips (SoCs) matching these requirements using existing IP building blocks:
- A coarse grained reconfigurable Montium ® processor [2]: this architecture provides a sound balance between cost, performance, flexibility and energy-efficiency in digital signal processing applications. Due to its unique combination of features typically found in ASICs (low power), FPGAs (efficient parallel processing) and DSPs (low cost, programmed using a high level language) the Montium processor provides a good design compromise between all these architectures. The table above shows a brief comparison between the Montium and other hardware architectures typically used in embedded systems with regard to relevant design parameters;
- A smart memory tile: this IP core is designed for memory intensive applications. Sophisticated address generation modes enable the implementation of functionalities such as interleaving and deinterleaving according to methods required in specific applications (commonly executed inefficiently on a general purpose processor), as well as for data buffering in all possible orders. Combined with DMA capabilities this provides very high flexibility in implementing tasks that require large on-chip memory storage. Data buffering decouples timing between IP blocks, which eases synchronization. This is crucial to minimize integration issues when composing a SoC from several independent IP blocks.
- Communication IP primitives:
- circuit switched Network-on-Chip (NoC) [3]: an area and power efficient on-chip interconnection system providing guaranteed core-to-core throughputs by means of multiple physical links (lanes) organized in an appropriate topology using on-chip routers;
- packet switched NoC: flexible and highly scalable on-chip interconnection system providing guaranteed coreto- core throughputs over a number of virtual channels (time multiplexed over a single physical channel) controlled by on-chip routers;
- AMBA AHB [5] on-chip interconnection system: provides easy integration of off-the-shelf components compliant with the industry-standard AMBA AHB bus architecture;
- Standard bridges (AHB ↔ USB, AHB ↔ NoC, etc).
III. Tailored scalable System-on-Chip platforms Montium technology can be incorporated in SoC architectures using two different approaches:
- One or more Montium core(s) can be used as a hardware accelerator next to a General Purpose Processor (GPP) as a kind of co-processor. In such kind of SoCs digital signal processor (DSP) kernels are offloaded from the GPP core and run on a Montium core. Communication in such small SoCs is generally established via an on-chip bus (e.g. AMBA), where the coprocessor is integrated (i.e. memory mapped) in the address space of the onchip bus system. Advantages of a bus based system are:
- A lot of IP has already a bus interface;
- Simple programming model due to memory mapped I/O;
This is a GPP centralized approach.
- Several Montium cores can be combined together in a SoC in a more autonomous way, where data is streamed through the SoC. The autonomous DSP kernels of the streaming application’s process graph are mapped on the individual Montium cores in the SoC. Once the system is running, the role of the GPP is limited. The com-munication infrastructure of these multi-core Montium-based SoCs is generally based on network on chip (NoC) technol-ogy. Advantages of a NoC are:
- It is very scalable. Communica-tion throughput and capacity in-crease with the size of the sys-tem;
- A NoC can provide real-time guarantees.
Scalability of Montium-based SoC designs is addressed by the mentioned different SoC configurations; the co-processor SoC design comes together with on-chip bus communication (see Figure 1), whereas the on-chip bus is replaced by a NoC when scaling to large multi-core systems (see Figure 2).
Figure 1 Bus based co-processor SoC
Figure 2 - SoC with hybrid NoC/bus communication structure The kind of target application determines heavily the final communication infrastructure of the SoC. In the remainder of this paper we discuss the construction of a hybrid SoC with a combination of a bus and NoC infrastructure. The discussed SoC uses multiple Montium cores interconnected through a NoC. The data is streamed through this NoC, where each Montium provides specific digital signal processing functionality. An AMBA-to-NoC bridge IP component interfaces the on-chip AMBA bus with the NoC for communication between the Montiums and the GPP.
IV. Application overview This section discusses the development of a SoC for a specific application. As example, we choose a Digital Audio Broadcasting (DAB) application. DAB is a standard that has been implemented in a lot of countries, especially the UK and Germany. In total, around 500 million people are in the coverage of DAB services. Since several successors are available (such as DAB+ and T-DMB), a flexible architecture is highly desirable. Due to these new developments, the DAB infrastructure is expanding rapidly and providing coverage to more and more people with expanding functionality.
Requirements for DAB receivers include: - Cost effective – DAB receivers are con-sumer equipment, which imposes very strict cost requirements. In particular this results in a requirement for small chip area;
- Ultra low power consumption – many DAB receivers are portable, battery po-wered handheld devices. This means a very tight energy consumption budget;
- Medium performance requirements – DAB is an audio broadcasting application. Compared to up to date multimedia broadcasting or wireless communication standards, DAB has limited requirements for computational processing power;
- Real-time requirements – multimedia broadcasting applications impose strict real-time requirements to guarantee conti-nuous playback of the received content;
- Architecture flexibility – as already men-tioned, multiple successor standards for DAB have emerged (DAB+, DMB, etc.), hence the architecture should provide flexibility to enable upgrades and to make it possible to provide feature differentia-tion within a common hardware architec-ture (which enables high volumes).
The core technologies used in DAB are Orthogonal Frequency Division Multiplexing (OFDM), convolutional channel coding with Equal/Unequal Error Protection (EEP/UEP) and MPEG-1 Audio Layer II coding (MP2). Each of these schemes typically imposes slightly different requirements for processing power and utilized hardware resources. Using a combination of Recore’s coarse grained reconfigurable technology and third party IP components we have composed and prototyped a multi-core system capable of processing through a DAB baseband sample stream and providing audio playback.
V. Prototyping Using off-the-shelf IP components we built a prototype in an FPGA device.
The IP Components used for building a heteroge-neous DAB receiver SoC (ingredients of the recipe) were:
- Three Montium cores used for intensive DSP processing. In a silicon implementa-tion less Montium cores will be required because the core clock frequency in an ASIC can be significantly higher as in an FPGA prototype. An OFDM demodulator, a DQPSK demapper / frequency deinter-leaver and a Viterbi decoder are mapped to the three Montium cores;
- A smart memory tile IP is used for time deinterleaving and data buffering;
- A Leon 2 CPU core [4] is used for system control and MP2 decoding;
- Two bridges are used to enable input data streaming over USB to NoC and for inte-grating the NoC and AHB on-chip subsys-tems.
Figure 3 shows how the process graph of the DAB receiver has been mapped on the prototyped heterogeneous SoC in the FPGA device. A PC is simulating an analog RF front-end and generates a DAB baseband sample stream. This data is streamed over USB to the FPGA. All digital data processing including digital-to-analog conversion of the final audio samples is done in our SoC on the FPGA. The Montium cores perform all computationally intensive DSP processing. The Leon 2 CPU is responsible for light weight SoC control and MP2 decoding. Audio playback is realized using a sigma-delta demodulator implemented in the FPGA and connected to an external speaker system via digital I/O pins coupled by an analog low pass filter.
Figure 3 - DAB kernels mapped on a heterogeneous SoC VI. Results The SoC is built with standard off-the-shelf IP components. With this system it is possible to run a whole DAB application in real-time on an FPGA development board. In this way, we could experiment with various IP components to come up with the most suitable architecture. In the final setup the Montiums are clocked as low as 15 MHz and the Leon 2 CPU runs at only 48 MHz. Such low clock frequencies result in a very power efficient system. As mentioned before, less Montiums are necessary in an ASIC implementa-tion making the SoC even more cost efficient.
VII. Conclusions With standard, off-the-shelf IP components we were able to construct a complete SoC in an FPGA architecture which provides very good prototyping possibilities. The use of standardized IP compo-nents enables short time to market which is crucial in modern development environments nowadays. The SoC can be completely tailored to our needs by picking the IP blocks suitable for a specific application domain. This approach also enables to explore tradeoffs between flexibility for future development and the total cost of the system. Furthermore, the SoC is very efficient because we matched the best hardware architectures (IP components) for the algorithms instead of the usual way around (matching the algorithms to one hardware architecture).
References [1] ITRS roadmap 2007,
http://www.itrs.net/Links/2007ITRS/Home2007.htm [2] P.M. Heysters, G.K. Rauwerda and L.T. Smit, “A flexible, low power, high performance DSP IP Core for Programmable Systems-on-chip.” in Proceedings of IP/SOC 2005”, dec 2005, pp. 1-5.
[3] P.T. Wolkotte, G.J.M. Smit, G.K. Rauwerda, and L.T.Smit, “An Energy-Efficient Reconfi-gurable Circuit Switched Network-on-Chip.” in Proceedings of the 19th IEEE International Parallel and Distributed Processing Sympo-sium (IPDPS'05) - 12th Reconfigurable Archi-tecture Workshop (RAW 2005), 4-8 Apr 2005, Denver, Colorado, USA. IEEE Computer Society. ISBN 0-7695-2312-9
[4] Leon2,
http://www.gaisler.com/cms/index.php?option=com_content&task=view&id=12&Itemid=52 [5]
http://www.arm.com/products/solutions/AMBAHomePage.html