On the Accuracy of Analog Neural Network Inference Accelerators
- URL: http://arxiv.org/abs/2109.01262v1
- Date: Fri, 3 Sep 2021 01:38:11 GMT
- Title: On the Accuracy of Analog Neural Network Inference Accelerators
- Authors: T. Patrick Xiao, Ben Feinberg, Christopher H. Bennett, Venkatraman
Prabhakar, Prashant Saxena, Vineet Agrawal, Sapan Agarwal, Matthew J.
Marinella
- Abstract summary: Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference.
This work shows how architectural design decisions, particularly in mapping neural network parameters to analog memory cells, influence inference accuracy.
- Score: 0.9440010225411358
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Specialized accelerators have recently garnered attention as a method to
reduce the power consumption of neural network inference. A promising category
of accelerators utilizes nonvolatile memory arrays to both store weights and
perform $\textit{in situ}$ analog computation inside the array. While prior
work has explored the design space of analog accelerators to optimize
performance and energy efficiency, there is seldom a rigorous evaluation of the
accuracy of these accelerators. This work shows how architectural design
decisions, particularly in mapping neural network parameters to analog memory
cells, influence inference accuracy. When evaluated using ResNet50 on ImageNet,
the resilience of the system to analog non-idealities - cell programming
errors, analog-to-digital converter resolution, and array parasitic resistances
- all improve when analog quantities in the hardware are made proportional to
the weights in the network. Moreover, contrary to the assumptions of prior
work, nearly equivalent resilience to cell imprecision can be achieved by fully
storing weights as analog quantities, rather than spreading weight bits across
multiple devices, often referred to as bit slicing. By exploiting
proportionality, analog system designers have the freedom to match the
precision of the hardware to the needs of the algorithm, rather than attempting
to guarantee the same level of precision in the intermediate results as an
equivalent digital accelerator. This ultimately results in an analog
accelerator that is more accurate, more robust to analog errors, and more
energy-efficient.
Related papers
- Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability.
One promising solution is to revisit analogue computing, a technique that predates digital computing.
Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z) - Quantization of Deep Neural Networks to facilitate self-correction of
weights on Phase Change Memory-based analog hardware [0.0]
We develop an algorithm to approximate a set of multiplicative weights.
These weights aim to represent the original network's weights with minimal loss in performance.
Our results demonstrate that, when paired with an on-chip pulse generator, our self-correcting neural network performs comparably to those trained with analog-aware algorithms.
arXiv Detail & Related papers (2023-09-30T10:47:25Z) - On the Non-Associativity of Analog Computations [0.0]
In this work, we observe that the ordering of input operands of an analog operation also has an impact on the output result.
We conduct a simple test by creating a model of a real analog processor which captures such ordering effects.
The results prove the existence of ordering effects as well as their high impact, as neglecting ordering results in substantial accuracy drops.
arXiv Detail & Related papers (2023-09-25T17:04:09Z) - A Blueprint for Precise and Fault-Tolerant Analog Neural Networks [1.6039298125810306]
High-precision data converters are costly and impractical for deep neural networks.
We address this challenge by using the residue number system (RNS)
RNS allows composing high-precision operations from multiple low-precision operations.
arXiv Detail & Related papers (2023-09-19T17:00:34Z) - Incrementally-Computable Neural Networks: Efficient Inference for
Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs.
We take an incremental computing approach, looking to reuse calculations as the inputs change.
We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z) - Biologically Plausible Learning on Neuromorphic Hardware Architectures [27.138481022472]
Neuromorphic computing is an emerging paradigm that confronts this imbalance by computations directly in analog memories.
This work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa.
arXiv Detail & Related papers (2022-12-29T15:10:59Z) - ZippyPoint: Fast Interest Point Detection, Description, and Matching
through Mixed Precision Discretization [71.91942002659795]
We investigate and adapt network quantization techniques to accelerate inference and enable its use on compute limited platforms.
ZippyPoint, our efficient quantized network with binary descriptors, improves the network runtime speed, the descriptor matching speed, and the 3D model size.
These improvements come at a minor performance degradation as evaluated on the tasks of homography estimation, visual localization, and map-free visual relocalization.
arXiv Detail & Related papers (2022-03-07T18:59:03Z) - AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On
Analog Compute-in-Memory Accelerator [50.31646817567764]
This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW)
We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities.
We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
arXiv Detail & Related papers (2021-11-10T10:24:46Z) - Ps and Qs: Quantization-aware pruning for efficient low latency neural
network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications.
We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z) - Inference with Artificial Neural Networks on Analog Neuromorphic
Hardware [0.0]
BrainScaleS-2 ASIC comprises mixed-signal neurons and synapse circuits.
System can also operate in a vector-matrix multiplication and accumulation mode for artificial neural networks.
arXiv Detail & Related papers (2020-06-23T17:25:06Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.