August 5, 2019 -- Machine learning algorithms are extremely computationally intensive and time consuming when they must be trained on large amounts of data. Typical processors are not optimized for machine learning applications and therefore offer limited performance. Therefore, both academia an industry is focused on the development of specialized architectures for the efficient acceleration of machine learning applications.
FPGAs are programmable chips that can be configured with tailored-made architectures optimized for specific applications. As FPGAs are optimized for specific tasks, they offer higher performance and lower energy consumption compared with general purpose CPUs or GPUs. FPGAs are widely used in applications like image processing, telecommunications, networking, automotive and machine learning applications.
Recently major cloud providers like Alibaba and Microsoft have started deploying Intel FPGAs in their data centers. However, currently there are limited cases of wide utilization of FPGAs in the domain of machine learning.
Towards this end, InAccel has developed an integrated ML suite based on FPGAs that allows up to 7x speedup compared to an 8-core general purpose CPU execution for widely used applications like logistic regression, K-means clustering and XGboost.
The IP cores for logistic regression and K-means clustering leverage the processing power of the Intel FPGAs to speedup the training of these algorithms. The IP core is optimized for the Intel® FPGAs (e.g. Arria® 10) available as instances (f1) on Alibaba cloud.
The Accelerated ML suite can be used as an add-on library that overload the most computationally intensive functions of the ML algorithms. InAccel offers all the required APIs for seamless integration with Python, Java and Scala. It also provides an integrated framework that allows the instant integration with distributed framework like Apache Spark. That means that data scientist and data engineers do not need to change their code at all.
InAccel also provides a unique FPGA resource manager that allows fully scalability of these applications to multiple FPGAs per server and multiple servers with FPGAs. The FPGA resource manager allows also the sharing (“virtualization”) of the FPGA resources making much easier the utilization of the accelerators from multiple applications.
InAccel Accelerated ML suite offers more than 7x kernel speedup (for ML training on logistic regression) and 5x overall system speedup including the time to transfer the data and the data preprocessing that is done on the processors.
In the case of K-means clustering, the IP cores offers 6.4x kernel speedup (for ML training) and 4.3x system speedup including the data preprocessing and the communication with the FPGAs.
That means that data scientists and ML engineers can enjoy much faster training of their models. In cases where AutoML is used and several models needs to be trained to find the optimum configuration, the faster execution of the trainings saves a lot of time from the users and the companies allowing them to find sooner the optimum configuration.
Inaccel ML suite for Intel FPGAs
Inaccel ML suite for Intel FPGA
Results have been estimated or simulated using internal InAccel analysis, architecture simulation, and modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase.
InAccel is specialized in developing high performance accelerators for machine learning, data analytics, data processing (compression, encryption) and financial applications. The accelerators from InAccel are compatible with high level distributed framework like Apache Spark. InAccel provides a unique FPGA resource manager that allows IP cores to be scaled instantly to many FPGAs and also allows the virtualization and the seamless sharing of the FPGA resources by many applications.