AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On
Analog Compute-in-Memory Accelerator
- URL: http://arxiv.org/abs/2111.06503v1
- Date: Wed, 10 Nov 2021 10:24:46 GMT
- Title: AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On
Analog Compute-in-Memory Accelerator
- Authors: Chuteng Zhou, Fernando Garcia Redondo, Julian B\"uchel, Irem Boybat,
Xavier Timoneda Comas, S. R. Nandakumar, Shidhartha Das, Abu Sebastian,
Manuel Le Gallo, Paul N. Whatmough
- Abstract summary: This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW)
We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities.
We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
- Score: 50.31646817567764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Always-on TinyML perception tasks in IoT applications require very high
energy efficiency. Analog compute-in-memory (CiM) using non-volatile memory
(NVM) promises high efficiency and also provides self-contained on-chip model
storage. However, analog CiM introduces new practical considerations, including
conductance drift, read/write noise, fixed analog-to-digital (ADC) converter
gain, etc. These additional constraints must be addressed to achieve models
that can be deployed on analog CiM with acceptable accuracy loss. This work
describes $\textit{AnalogNets}$: TinyML models for the popular always-on
applications of keyword spotting (KWS) and visual wake words (VWW). The model
architectures are specifically designed for analog CiM, and we detail a
comprehensive training methodology, to retain accuracy in the face of analog
non-idealities, and low-precision data converters at inference time. We also
describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog
CiM accelerator, with a novel layer-serial approach to remove the cost of
complex interconnects associated with a fully-pipelined design. We evaluate the
AnalogNets on a calibrated simulator, as well as real hardware, and find that
accuracy degradation is limited to 0.8$\%$/1.2$\%$ after 24 hours of PCM drift
(8-bit) for KWS/VWW. AnalogNets running on the 14nm AON-CiM accelerator
demonstrate 8.58/4.37 TOPS/W for KWS/VWW workloads using 8-bit activations,
respectively, and increasing to 57.39/25.69 TOPS/W with $4$-bit activations.
Related papers
- A Pipelined Memristive Neural Network Analog-to-Digital Converter [0.24578723416255754]
This paper proposes a scalable and modular neural network ADC architecture based on a pipeline of four-bit converters.
An 8-bit pipelined ADC achieves 0.18 LSB INL, 0.20 LSB DNL, 7.6 ENOB, and 0.97 fJ/conv FOM.
arXiv Detail & Related papers (2024-06-04T10:51:12Z) - RACE-IT: A Reconfigurable Analog CAM-Crossbar Engine for In-Memory
Transformer Acceleration [21.196696191478885]
Transformer models represent the cutting edge of Deep Neural Networks (DNNs)
processing these models demands significant computational resources and results in a substantial memory footprint.
We introduce a novel Analog Content Addressable Memory (ACAM) structure capable of performing various non-MVM operations within Transformers.
arXiv Detail & Related papers (2023-11-29T22:45:39Z) - Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability.
One promising solution is to revisit analogue computing, a technique that predates digital computing.
Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z) - Incrementally-Computable Neural Networks: Efficient Inference for
Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs.
We take an incremental computing approach, looking to reuse calculations as the inputs change.
We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z) - AnalogNAS: A Neural Network Design Framework for Accurate Inference with
Analog In-Memory Computing [7.596833322764203]
Inference at the edge requires low latency, compact and power-efficient models.
analog/mixed signal in-memory computing hardware accelerators can easily transcend the memory wall of von Neuman architectures.
We propose AnalogNAS, a framework for automated Deep Neural Network (DNN) design targeting deployment on analog In-Memory Computing (IMC) inference accelerators.
arXiv Detail & Related papers (2023-05-17T07:39:14Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC
Operation for 4-bit Input Processing [4.054285623919103]
This paper presents a low cost PMOS-based 8T (P-8T) Compute-In-Memory (CIM) architecture.
It efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights.
The 256X80 P-8T CIM macro implementation using 28nm CMOS process shows the accuracies of 91.46% and 66.67%.
arXiv Detail & Related papers (2022-11-29T08:15:27Z) - On the Accuracy of Analog Neural Network Inference Accelerators [0.9440010225411358]
Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference.
This work shows how architectural design decisions, particularly in mapping neural network parameters to analog memory cells, influence inference accuracy.
arXiv Detail & Related papers (2021-09-03T01:38:11Z) - Leveraging Automated Mixed-Low-Precision Quantization for tiny edge
microcontrollers [76.30674794049293]
This paper presents an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices.
Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors.
Given an MCU-class memory bound to 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions.
arXiv Detail & Related papers (2020-08-12T06:09:58Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.