InAccel Accelerates XGboost and releases the IP core for FPGAs
July 31, 2019 -- Machine learning algorithms are extremely computationally intensive and time consuming when they must be trained on large amounts of data. Typical processors are not optimized for machine learning applications and therefore offer limited performance. Therefore, both academia an industry is focused on the development of specialized architectures for the efficient acceleration of machine learning applications.
One of the most efficient ML algorithm widely-used in the last few years is XGboost. XGBoost is an open-source software library which provides a gradient boosting framework.
FPGAs are programmable chips that can be configured with tailored-made architectures optimized for specific applications. As FPGAs are optimized for specific tasks, they offer higher performance and lower energy consumption compared with general purpose CPUs or GPUs. FPGAs are widely used in applications like image processing, telecommunications, networking, automotive and machine learning applications.
Recently major cloud and HPC providers like Amazon AWS, Alibaba, Huawei and Nimbix have started deploying FPGAs in their data centers. However, currently there are limited cases of wide utilization of FPGAs in the domain of machine learning.
Towards this end, InAccel has released today as open-source the FPGA IP core for the training of XGboost.
The FPGA accelerated solution for the XGBoost algorithm is based on the Exact (Greedy) algorithm for tree creation. It can provide up to 26x speedup compared to a single threaded execution and up to 5x compared to an 8 threaded CPU execution respectively. The acceleration is attained by exposing parallelism and reusing data in the features dimension of the dataset.
The accelerator accumulates the gradients for each feature, calculates possible splits and keeps the best split for each node. To avoid frequent accesses to the FPGA DDR RAM, we load up to 65536 entries to BRAM inside the accelerator. Also, to keep the accumulation and best calculated split of each node, we keep up to 2048 nodes to BRAM inside the accelerator. To be able to accumulate floating point values with minimal interval, we convert them to fixed point arithmetics with negligible change to the results.
The necessary software that integrates the accelerator with the XGBoost library is also provided. A new tree method is added, called fpga_exact that uses our updater and the pruner.
The IP core for XGboost leverage the processing power of the Xilinx FPGAs. The IP core is optimized for the Xilinx FPGAs like Alveo U200 and U250 cards and the FPGAs available as instances on the cloud providers (f1 on AWS and f3 on Alibaba cloud).
The release of the XGboost IP core will help demonstrate the advantages of the FPGAs in the domain of machine learning and it will offer to the data science community the chance to experiment, deploy and utilize FPGAs in order to speedup their machine learning applications.
InAccel offers all the required APIs for seamless integration with Python, Java and Scala. That means that data scientist and data engineers do not need to change their code at all. Also, thought the unique FPGA Resource Manager it allows instant scalability to multiple FPGA boards.
The IP core is available on: https://github.com/InAccel/xgboost
InAccel is specialized in developing high performance accelerators for machine learning, data analytics, data processing (compression, encryption) and financial applications. The accelerators from InAccel are compatible with high level distributed framework like Apache Spark. InAccel provides a unique FPGA resource manager that allows IP cores to be scaled instantly to many FPGAs and also allows the virtualization and the seamless sharing of the FPGA resources by many applications.
Christoforos (Chris) Kachris is the founder and CEO of InAccel that helps companies to speedup their applications using hardware accelerators (FPGAs) in the cloud or on-prem. He is the editor of the book Hardware Accelerators in Data Centers.
He has over 15 years of experience on FPGAs (reconfigurable computing), digital design, embedded systems (SoCs), and HW/SW co-design mainly in network processing, microservers, optical interconnects, and telecommunication systems.
|
Related News
- InAccel releases world's first universal bitstream repository for FPGAs based on JFrog
- InAccel releases open-source Logistic Regression IP core for FPGAs
- Efinix Releases Topaz Line of FPGAs, Delivering High Performance and Low Power to Mass Market Applications
- Efinix Releases TinyML Platform for Highly Accelerated AI Workloads on Its Efficient FPGAs
- World's first end-to-end integration of ZKP with FPGAs.
Breaking News
- Frontgrade Gaisler Unveils GR716B, a New Standard in Space-Grade Microcontrollers
- Blueshift Memory launches BlueFive processor, accelerating computation by up to 50 times and saving up to 65% energy
- Eliyan Ports Industry's Highest Performing PHY to Samsung Foundry SF4X Process Node, Achieving up to 40 Gbps Bandwidth at Unprecedented Power Levels with UCIe-Compliant Chiplet Interconnect Technology
- CXL Fabless Startup Panmnesia Secures Over $60M in Series A Funding, Aiming to Lead the CXL Switch Silicon Chip and CXL IP
- Cadence Unveils Arm-Based System Chiplet
Most Popular
- Cadence Unveils Arm-Based System Chiplet
- CXL Fabless Startup Panmnesia Secures Over $60M in Series A Funding, Aiming to Lead the CXL Switch Silicon Chip and CXL IP
- Esperanto Technologies and NEC Cooperate on Initiative to Advance Next Generation RISC-V Chips and Software Solutions for HPC
- Eliyan Ports Industry's Highest Performing PHY to Samsung Foundry SF4X Process Node, Achieving up to 40 Gbps Bandwidth at Unprecedented Power Levels with UCIe-Compliant Chiplet Interconnect Technology
- Arteris Selected by GigaDevice for Development in Next-Generation Automotive SoC With Enhanced FuSa Standards
E-mail This Article | Printer-Friendly Page |