Why the Memory Subsystem is Critical in Inferencing Chips
By Geoff Tate, Flex Logix
EETimes (December 22, 2019)
Good inferencing chips can move data very quickly.
The number of new inferencing chip companies announced this past year is enough to make your head spin. With so many chips and no lack of any quality benchmarks, the industry often forgets one extremely critical piece: the memory subsystem. The truth is, you can’t have a good inference chip unless you have a good memory subsystem. Thus, if an inferencing chip company is only talking about TOPS and having very little discussion around SRAM, DRAM and the memory subsystem in general, they probably don’t have a very good solution.
It’s All About Data Throughput
Good inferencing chips are architected so that they can move data through them very quickly, which means they have to process that data very fast, and move it in and out of memory very quickly. If you look at models using ResNet-50 and YOLOv3, you will see a striking difference not only in their computational side, but also in how they each use memory.
For each image using ResNet-50, it takes 2 billion multiply accumulates (MACs), but for YOLOv3 it takes over 200 billion MACs. That is a hundred times increase. Part of this is due to the fact that there are more weights for YOLOv3 (62 million weights versus approximately 23 million for ResNet-50.) However, the biggest difference is with the image size in the typical benchmark. ResNet-50 uses 224×224 which is the size no one actually uses and YOLOv3 uses 2 megapixels. Thus, the computational load is much greater on YOLOv3.
Using the example above, you can see that we have two different workloads and one takes 100 times more. The obvious question is: does this mean YOLOv3 runs 100 times slower? The only way you can answer that is by looking at the memory subsystem because that is going to tell you the actual throughput on any given chip.
E-mail This Article | Printer-Friendly Page |
|
Related Articles
- Why Interlaken is a great choice for architecting chip to chip communications in AI chips
- Why Software is Critical for AI Inference Accelerators
- Advantages and Challenges of Designing with Multiple Inferencing Chips
- Efficient methodology for design and verification of Memory ECC error management logic in safety critical SoCs
- NVM memory: A Critical Design Consideration for IoT Applications
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)