MulticoreWare Brings Unprecedented Performance and Productivity to New Processor Platforms
MulticoreWare announced today the availability of MulticoreWare’s MxPA OpenCL compiler supporting kernel fusion, automatic DMA and vectorization.
Sunnyvale, CA -- March 16, 2015 -- In the highly competitive mobile and embedded device markets, manufacturers are increasingly developing sophisticated applications that rely on powerful image processing algorithms. These applications include computer vision (for example, gesture recognition, object recognition), 3D scanning and modeling, augmented reality, computational photography (rapid autofocus, HDR, extreme low light) and visual awareness. New processor architectures, including vision processors and digital signal processors are designed to support these computationally demanding applications with high performance and low power consumption.
“Programming embedded multicore heterogeneous processors can be challenging due to specialized VLIW single thread processor cores, and extensive use of scratchpad memory with explicitly managed, software controlled DMA engines”, said Dr. Wen-mei Hwu, CTO of MulticoreWare. “Optimizing kernels for such specialized architectures can be highly tedious and error-prone. Another problem in memory hierarchy performance stems from the bulk synchronous parallel execution model employed by OpenCL. For example, in an OpenCV pipeline, an operation may require multiple image filters chained together. Each filter is usually implemented as a separate OpenCL kernel, allowing composability but resulting in poor memory hierarchy performance. Programmers are often compelled to manually fuse the kernels so that the locality is preserved. Until now, there has been no systematic way to automatically fuse assembly kernels with user-provided OpenCL kernels.”
To enable mobile device OEMs to rapidly port their applications to new processor architectures, MulticoreWare has extended the powerful MultiCore Cross-Platform Architecture (MxPA) with several powerful new capabilities. This OpenCL 1.2 compliant compiler will be world’s first that features real-time kernel fusion, automatic determination of DMA method and auto vectorization.
- Real-time Kernel Fusion is an innovative and powerful technique that reigns in the memory bandwidth consumption of OpenCL code that reflects GPU programming mindset. The MxPA OpenCL compiler also improves performance through automatic fusion of threads and workgroups.
- Auto DMA: Automatically creates DMA commands from memory accesses in OpenCL kernel functions at compile time.
- Auto Divergence-Tolerant Vectorization - Preserves the partial vectorization capability of GPUs when executing code on VLIW (very large instruction word) architectures.
These advanced capabilities enable developers to achieve performance that is close to that of hand-optimized C code, working much more productively by using the high level OpenCL programming model. Up to 2X speedup has been achieved for OpenCL implementations of OpenCV computer vision pipelines. MxPa has been integrated with other front ends to support Renderscript or C++ AMP.
“MxPA was critical to the success of C++AMP development. It enabled HSA devices to outperform other implementations in the market at a fraction of the cost,” said Greg Stoner, Managing Director of HSA Foundation. To further assist developers, MulticoreWare offers a complete range of professional services including training, support, porting and performance optimization of existing functions and applications, and the development of new computer vision libraries.
Remi El-Ouazzane, CEO of Movidius remarked, “Movidius is opening the era of visual sensing in the internet of things through its Myriad Vision Processor Unit (VPU). We understand how powerful an abstraction framework can be to help developers accelerate their time to market. We believe that MulticoreWare’s MxPA OpenCL platform is a very important compiler technology initiative that will prove to be an industry game changer.”
A. G. Karunakaran (AGK), CEO of MulticoreWare commented, “Our solution architects have spent many years researching and solving the problem of enabling the highest levels of developer productivity while achieving optimal performance on a target hardware platform. MxPA delivers on that vision, supporting the latest heterogeneous software programming models on the newest, most exciting processor architectures. ”
For more information, contact Tom Vaughan, VP of Products at tom.vaughan(at)multicorewareinc(dot)com, or visit MulticoreWare on the web at http://www.multicorewareinc.com.
|
Related News
- Altera's Quartus II Software Version 10.0 Delivers Unprecedented Performance and Productivity for High-End FPGAs
- Open Virtual Platforms (OVP) Releases Vendor-Verified High Performance Models of Virage Logic's ARC Processors
- Open Virtual Platforms (OVP) Initiative Releases High Performance Models of Advanced MIPS Technologies Processors
- Open Virtual Platforms (OVP) Initiative for Multi-Core Software Development Releases High Performance Models of ARM Processors
- Altera and Xtremedata Show Industry's Highest Performance Front Side Bus Module for Intel Xeon Processor-Based Platforms
Breaking News
- TSMC drives A16, 3D process technology
- Frontgrade Gaisler Unveils GR716B, a New Standard in Space-Grade Microcontrollers
- Blueshift Memory launches BlueFive processor, accelerating computation by up to 50 times and saving up to 65% energy
- Eliyan Ports Industry's Highest Performing PHY to Samsung Foundry SF4X Process Node, Achieving up to 40 Gbps Bandwidth at Unprecedented Power Levels with UCIe-Compliant Chiplet Interconnect Technology
- CXL Fabless Startup Panmnesia Secures Over $60M in Series A Funding, Aiming to Lead the CXL Switch Silicon Chip and CXL IP
Most Popular
- Cadence Unveils Arm-Based System Chiplet
- CXL Fabless Startup Panmnesia Secures Over $60M in Series A Funding, Aiming to Lead the CXL Switch Silicon Chip and CXL IP
- Esperanto Technologies and NEC Cooperate on Initiative to Advance Next Generation RISC-V Chips and Software Solutions for HPC
- Eliyan Ports Industry's Highest Performing PHY to Samsung Foundry SF4X Process Node, Achieving up to 40 Gbps Bandwidth at Unprecedented Power Levels with UCIe-Compliant Chiplet Interconnect Technology
- Arteris Selected by GigaDevice for Development in Next-Generation Automotive SoC With Enhanced FuSa Standards
E-mail This Article | Printer-Friendly Page |