LeapMind's Ultra Low-Power AI accelerator IP "Efficiera" Achieved industry-leading power efficiency of 107.8 TOPS/W

August 1, 2023 – Tokyo Japan - LeapMind Co., Ltd., a leading creator of the standard in edge artificial intelligence (AI) announced today that it has achieved an industry-leading power efficiency of 107.8 TOPS/W*1, 2 with “Efficiera”, an ultra-low power AI accelerator IP using the company's core technology, “Extremely low bit quantization”. When comparing this to similar devices in today’s market, 1 ~ 5 TOPS/W in the case of GPUs and about 20 TOPS/W for AI accelerators for edge devices, this means that Efficiera performance would be approximately 5~100 times higher than these devices at the same power consumption.

Hardware challenges with increasing size of AI tasks

In recent years, as the execution of AI tasks has become practical on various devices, AI tasks have become larger and more complex year by year. According to the analysis by OpenAI*3, the amount of AI computation used has dramatically increased at a rate of 2x in 3-4 months since 2012 (Moore's Law is 2x in 2 years), and it is worth preparing for the systems far beyond current capabilities. The analysis also notes that the processing of such AI tasks has been driven by custom hardware such as GPUs and/or more chips in parallel. This could indicate two technical challenges for the practical application of edge AI. First, as AI tasks become increasingly complex and larger in scale, the computational performance required during AI training and inference is much higher than before. The second is the challenge regarding the power consumption of devices with higher-performance AI chips processing complex AI tasks. The power available for an AI chip to process AI tasks stably in smartphones and AR glasses is less than 1W and around 25W for autonomous driving, where advanced AI processing is required. This means, edge AI also requires high power efficiency.（Figure 1)

Click to enlarge

Figure1　AI task and required performance

Click to enlarge

Reference：Power efficiency comparison (Based on in-house research)

Low-bit quantization technology, and a solution for enabling edge AI with “Efficiera”

Deep learning-based computations are usually performed with 32 bit floating point (FP32) to ensure accuracy. However, this method involves repeated matrix operations (sum-of-products operations) and requires a lot of computational resources, and several important requirements must be met in both software and hardware to perform AI tasks on edge devices, where memory and power are limited and real-time execution is required. The followings are some of the key requirements to be met. First of all, software requirements include lightweight models, model accuracy comparable to FP32, and model conversion to maximize the efficiency of AI accelerator execution. Second, hardware requirements include low power consumption, high performance for executing various AI tasks, less memory bandwidth, and less memory usage.

To enable edge AI, all of these requirements must be addressed simultaneously, but a general approach is to first use a technique called quantization to lightweight the model. Lightweight 8 bit quantization is commonly used for AI models running on edge devices such as smartphones, but even with this, the AI tasks viable for implementation are still limited to some tasks such as object detection and image processing of static images, from the standpoint of power consumption and heat dissipation. In order to implement various AI tasks, the challenge is that the power efficiency and hardware performance to support many of AI tasks are not sufficient for the scale of those tasks.

In response, LeapMind aims to enable practical AI on various devices, and to solve both software and hardware requirements with our strength, “Extremely low bit quantization” technology. As a part of our effortson the software side, we use this “Extremely low bit quantization” to reduce the size of the model, achieving an ultra-low bit quantization of 1-bit weight x 2-bit activation, and our patented proprietary quantization technology has brought us success in low-bit quantization with minimal degradation in accuracy. In addition, a model converter dedicated to "Efficiera" optimizes computing efficiency and memory transfer to maximize hardware execution efficiency. Furthermore, we have been developing the AI accelerator "Efficiera" as hardware for the efficient execution of Extremely low bit quantization models. Our Extremely low bit quantization technology enables to perform matrix operations with simple logic operations, thus providing high arithmetic efficiency in a small circuit area. Since the DRAM memory transfer data during inference is 1 or 2 bits, the data size is equivalent to that of the data compression, reducing memory usage and bandwidth. By using Extremely low bit quantization technology, it is now possible to improve power efficiency, arithmetic efficiency, and area efficiency while minimizing the accuracy degradation caused by quantization.

Matsuda, Soichi, Chief Executive Officer of LeapMind, says “As one of the few companies in the world that specialize in research and development of quantization technology as a method of AI technology and in the design of semiconductors for AI accelerators developed, we have been engaged in research and development for many years through joint development with major companies with the aim of achieving the practical application of AI. The balancing of high power efficiency and AI task processing performance with AI accelerators is essential for the implementation of AI in society. And today, we have achieved a power efficiency of 107.8 TOPS/W, far exceeding that of conventional GPUs and NPUs in terms of power efficiency. I believe that we have achieved an important milestone in our efforts to bring AI to the public more broadly. We will continue to strive to solve our customers' problems and pursue research and development to contribute to the evolution of AI technology”.

Please refer to LeapMind website below for the details of Efficiera. https://leapmind.io/en/products/efficiera-ip/

Appendix *1 Estimated values are based on logic synthesis results using Cadence Genus Synthesis Solution. Conditions are as follows. ・Calculated based on IP stand-alone (Not including power of other subsystems such as DRAMs, main bus processors, etc.) ・Process: 7nm ・Clock frequency：533MHz ・Power supply voltage: Center ・Temperature: 25℃ ・MAC utilization: 75.8% *2 TOPS/W: Processing performance per watt（TOPS: tera operations per second）This indicates power efficiency. The higher this number, the greater the amount of computation to be processed with low-power consumption. *3 https://openai.com/research/ai-and-compute

About LeapMind Inc.

Founded in 2012, LeapMind has been providing solutions mainly to the consumer electronics, automotive, and manufacturing industries, as one of the few companies in the world that provide “Extremely low bit quantization” technology to downsize AI and proprietary AI accelerator semiconductor design, under the corporate mission of "To create innovative devices with machine learning and make them available everywhere.” In order to provide foundational technologies to enable the next generation of information devices, we are committed to the development of both the software and hardware necessary to create such devices.LeapMind's business is highly acclaimed and has raised a cumulative total of 4.99 billion yen in funding (as of June 2023). For more information, please visit corporate website (https://leapmind.io/en/)

LeapMind's Ultra Low-Power AI accelerator IP "Efficiera" Achieved industry-leading power efficiency of 107.8 TOPS/W

Contact LeapMind