Flex Logix Announces nnMAX IP Delivers Higher-Throughput/$ and Higher Throughput/Watt for Key DSP Functions
Cheng Wang unveiled key performance benchmarks at today’s virtual Linley Processor Forum, proving that nnMAX is not only superior for AI inference, but also for key DSP functions
MOUNTAIN VIEW, Calif. – April 7, 2020 – Flex Logix Technologies, Inc., the leading supplier of embedded FPGA (eFPGA) IP, architecture and software, today unveiled key benchmarking information around its new nnMAX architecture, showing how it can effectively be used for DSP acceleration for key functions. For FIR (finite impulse response) filters, nnMAX is able to process up to 1 Gigasamples per second with hundreds and even thousands of “taps” or coefficients. FIR filters are widely used in a large number of commercial and aerospace applications. Cheng Wang, Flex Logix’s senior VP engineering and co-founder, disclosed these benchmarks and more at today’s online Linley Processor Forum in a presentation titled “DSP Acceleration using nnMAX.” His full presentation can be viewed at this link.
“Because nnMAX is so good at accelerating AI inference, customers started asking us if it could also be applied to DSP functions,” said Geoff Tate, CEO and co-founder of Flex Logix. “When we started evaluating their models, we found that it can deliver similar performance to the most expensive Xilinx FPGAs in the same process node (16nm), and is also faster than TI’s highest-performing DSP – but in a much smaller silicon area than both those solutions. nnMAX is available now for 16nm SoC designs and will be available for additional process nodes in 2021.”
About nnMAX
NMAX is a general purpose Neural Inferencing Engine that can run any type of NN from simple fully connected DNN to RNN to CNN and can run multiple NNs at a time. It has demonstrated excellent inference efficiency, delivering more throughput on tough models for less $, less watts.
nnMAX is programmed with TensorFlow Lite and ONNX. Numerics supported are INT8, INT16 and BFloat16 and can be mixed layer by layer to maximize prediction accuracy. INT8/16 activations are processed at full rate; BFloat16 at half rate. Hardware converts between INT and BFloat as needed layer by layer. 3×3 Convolutions of Stride 1 are accelerated by Winograd hardware: YOLOv3 is 1.7x faster, ResNet-50 is 1.4x faster. This is done at full precision. Weights are stored in non-Winograd form to keep memory bandwidth low. nnMAX is a tile architecture any throughput required can be delivered with the right amount of SRAM for your model.
About Flex Logix
Flex Logix provides solutions for making flexible chips and accelerating neural network inferencing. Its eFPGA platform enables chips to be flexible to handle changing protocols, standards, algorithms and customer needs and to implement reconfigurable accelerators that speed key workloads 30-100x compared to processors. Flex Logix’s second product line, nnMAX, utilizes its eFPGA and interconnect technology to provide modular, scalable neural inferencing from 1 to >100 TOPS using a higher throughput/$ and throughput/watt compared to other architectures. Flex Logix is headquartered in Mountain View, California.
|
Related News
- Flex Logix Launches InferX X1 Edge Inference Co-Processor That Delivers Near-Data Center Throughput at a Fraction of the Power and Cost
- Fundamental Inventions Enable the Best PPA and Most Portable eFPGA/DSP/SDR/AI IP for Adaptable SoCs
- Industry Leading PPA DSP Available For All Existing EFLX eFPGA
- Flex Logix Cancels AI Chip, Markets IP for AI and DSP
- Flex Logix Announces InferX™ High Performance IP for DSP and AI Inference
Breaking News
- Ubitium Debuts First Universal RISC-V Processor to Enable AI at No Additional Cost, as It Raises $3.7M
- TSMC drives A16, 3D process technology
- Frontgrade Gaisler Unveils GR716B, a New Standard in Space-Grade Microcontrollers
- Blueshift Memory launches BlueFive processor, accelerating computation by up to 50 times and saving up to 65% energy
- Eliyan Ports Industry's Highest Performing PHY to Samsung Foundry SF4X Process Node, Achieving up to 40 Gbps Bandwidth at Unprecedented Power Levels with UCIe-Compliant Chiplet Interconnect Technology
Most Popular
- Cadence Unveils Arm-Based System Chiplet
- CXL Fabless Startup Panmnesia Secures Over $60M in Series A Funding, Aiming to Lead the CXL Switch Silicon Chip and CXL IP
- Esperanto Technologies and NEC Cooperate on Initiative to Advance Next Generation RISC-V Chips and Software Solutions for HPC
- Eliyan Ports Industry's Highest Performing PHY to Samsung Foundry SF4X Process Node, Achieving up to 40 Gbps Bandwidth at Unprecedented Power Levels with UCIe-Compliant Chiplet Interconnect Technology
- Arteris Selected by GigaDevice for Development in Next-Generation Automotive SoC With Enhanced FuSa Standards
E-mail This Article | Printer-Friendly Page |