|
||||||||||
Distribution: An approach for Virtual Platform scalabilityStephane Farrouch (STMicroelectronics), Hmayak Arzumanyan (ProximusDA Cjsc) Abstract: System-On-Chips are becoming more and more complex, like Set-Top-Boxes: embedding hundreds of IPs and tens of cores, running 10 Giga-instructions for OS Boot and 200 Giga-instructions of drivers initialization, and playing several 4Kp60 (UltraHD 60Hz) video flux in parallel with high bandwidth networking and graphics. Reducing time-to-market thus requires anticipation of tests and software development far before board Silicon availability. This anticipation heavily relies on Virtual Platforms and Coemulation/Coprototyping solutions, with particularly an increasing usage of Virtual Platforms by hundreds of users with different focus: SoC verification tests, System validation tests, OS kernel, drivers, applications. In this paper we focus on TLM-LT (Loosely Timed, Programmers’ View) platforms, which are usually the first needed/requested ones. And how to address one of the main challenges of these Virtual Platforms: enhancing their execution speed. I. INTRODUCTION Main challenges for Virtual Platforms integrators are:
The focus of this paper is to address this execution speed, by showing the impacting factors, reviewing what the classical approaches are, and exposing an experimented solution. II. THE VP SPEED BOTTLENECK A. 1st factor: increasing VP complexity & size A huge number of IP models are coexisting in a VP; a typical Set-Top-Box VP is made of
B. 2nd factor: SystemC kernel way of modeling Hw/Sfw parallelization SystemC is an event-based scheduler for modeling parallel Hardware resources using multiple sc_threads running inside one single Linux process on a single CPU core. This means:
C. 3rd factor: difficulty to get optimized models in time Given the complexity of SoCs, IP models are done by teams with different skills, with priority to provide a working IP, meaning heterogeneity in:
D. 4th factor: huge amount of embedded software, and not always optimized Main usage of Virtual Platform is for embedded software development anticipation vs boards availability:
Moreover as one big advantage of Virtual Platform is the offered debug possibilities, these softwares are kept:
All this results in having the software execution taking a significant part of the VP execution time, part on which no real optimization could be done during the first steps of the development. E. Conclusion: The simulation times typically required by modern SystemC/TLM-based VP's to run the entire non-regression test-suite can take many tens of hours.
Fig. #1: Some figures achieved on set-top-box VP III. OPTIMIZATION TRACKS: SOME CLASSICAL APPROACHES LIMITATIONS AND EXPECTED GAIN A. Overview of the approaches Beside classical general program optimization approaches described in [3], some VP-specific approaches are also usually taken. Some are TLM-LT modeling guidelines (cf [5] section 4 and [6]) at IP/subsystem level:
Others are approaches at system level:
Fig. #2: Limitations and usual gain of classical approaches B. Conclusion: For expecting significant gain classical approaches should be applied as much as possible; and by experience: you always have to convince your providers and fight against their higher priorities. Moreover, the gain in performance that you need (several orders of magnitude) is much higher than what you could reach with those approaches. IV. THE CHOSEN ALTERNATIVE APPROACH: HAVING PARALLELIZED PLATFORMS A. Main principle: The approach is based on a simple idea: a way to approach real time in modeling parallelized SoC ... is to parallelized VP execution! Wein in [1] has explained how TLM helps in accelerating execution of a design description on parallel computer systems, thanks to its scheduling scheme tightly coupled to its explicit communication scheme. This principle, applied to VP, helps in speeding-up VP execution. Instead of having a single SystemC kernel simulating the entire platform on a single CPU, the platform is partitioned, and several SystemC kernels are running in parallel, each with a part of the platform running on a different core on the host station. This allows data processing parallelism as well as efficient modeling of pipelining. And as it is still based on SystemC kernel, exact same models are used. The overall automated flow (import and re-assembly) is based on IP-Xact description, allowing fast looping during the partitioning trials. Main steps of parallelization loop are as follows:
Fig. #3: Graphical partitioning of the VP on 3 cores - screenshot B. Parallelization alternatives In [4] authors are describing two techniques for accelerating VP while benefiting from symmetric multiprocessor (SMP) workstations:
We did not go into this approach (even if it is quite similar to the taken approach) since our choice is to have untimed VP purely event-driven, so it was not making any sense for our needs to use a solution aiming to ensure time consistency across partitions. C. Rationale of this choice: We considered using this approach since:
D. Obtained performances: example on a set-top-box VP Data below depicts several experiments with splitting the VP into 2, 3, 5, and 6 partitions. The acceleration factor reflects the achieved speedup, which sometimes is over-linear, like for example in experiment #4 with 2 partitions. The over-linear acceleration is due to higher number of CPU cache hits with good partitioning and more CPU-s used for simulation.
Fig. #4: Acceleration factor vs partitions E. Perspectives: Using such an approach, different perspectives are offered, among others:
V. CONCLUSION Classical optimizations are far from being enough for increasing execution speed of Virtual Platforms, and require huge efforts on a subject which is usually out of the priority scope of IP models providers. The experimented approach offers high performances gain (several factors), flexibility and scalability with lower efforts. ACKNOWLEDGMENT We want to thanks Philippe Metsu and Laurent Ducousso for their long-term conviction on the benefit of this approach, and for their efforts for having it tested in a real case. We thank Proximus team for their support and proactivity while using their tool. REFERENCES [1] Enno Wein, “HW/SW Co-Design of Parallel Systems”, IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2010 [2] Sungpack Hong et.al., “Creation and Utilization of a Virtual Platform for Embedded Software Optimization: An Industrial Case Study”, CODES+ISSS International Conference Hardware/Software Codesign and System Synthesis, 2006 [3] Wikipedia, “Program optimization”, 2014 [4] Aline Vieira De Mello, Isaac Maia, François Pécheux, Alain Greiner, «Parallel simulation of SystemC TLM 2.0 compliant MPSoCs », Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010 [5] Marcelo Montoreano, “Transaction Level Modeling using OSCI TLM 2.0”, 2007 [6] STARC - Semiconductor Technology Academic Research Center, “Transaction Level Modeling Guide”, 2008 Glossary: IP – Intellectual Property – HW or SW implementation of a set of features contributing to the overall system TLM - Transaction Level Modeling – now part of IEEE 1666™ "SYSTEMC LANGUAGE" VP – Virtual Platform – Considered made in SystemC/ TLM hereafter. SoC – SystemOnChip – complete piece of Silicon containing all the hardware resources (IPs, cores) necessary for targeted use cases
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |