Related papers: On the Accuracy of Analog Neural Network Inference Accelerators

On the Accuracy of Analog Neural Network Inference Accelerators

URL: http://arxiv.org/abs/2109.01262v1
Date: Fri, 3 Sep 2021 01:38:11 GMT
Title: On the Accuracy of Analog Neural Network Inference Accelerators
Authors: T. Patrick Xiao, Ben Feinberg, Christopher H. Bennett, Venkatraman Prabhakar, Prashant Saxena, Vineet Agrawal, Sapan Agarwal, Matthew J. Marinella
Abstract summary: Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. This work shows how architectural design decisions, particularly in mapping neural network parameters to analog memory cells, influence inference accuracy.
Score: 0.9440010225411358
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy efficiency, there is seldom a rigorous evaluation of the accuracy of these accelerators. This work shows how architectural design decisions, particularly in mapping neural network parameters to analog memory cells, influence inference accuracy. When evaluated using ResNet50 on ImageNet, the resilience of the system to analog non-idealities - cell programming errors, analog-to-digital converter resolution, and array parasitic resistances - all improve when analog quantities in the hardware are made proportional to the weights in the network. Moreover, contrary to the assumptions of prior work, nearly equivalent resilience to cell imprecision can be achieved by fully storing weights as analog quantities, rather than spreading weight bits across multiple devices, often referred to as bit slicing. By exploiting proportionality, analog system designers have the freedom to match the precision of the hardware to the needs of the algorithm, rather than attempting to guarantee the same level of precision in the intermediate results as an equivalent digital accelerator. This ultimately results in an analog accelerator that is more accurate, more robust to analog errors, and more energy-efficient.

Related papers

Efficient Synaptic Delay Implementation in Digital Event-Driven AI Accelerators [1.260842513389711]
We introduce Shared Circular Delay Queue (SCDQ), a novel hardware structure for supporting synaptic delays on digital neuromorphic accelerators. Our analysis and hardware results show that it scales better in terms of memory, than current commonly used approaches, and is more amortizable to algorithm- hardware co-optimizations.
arXiv Detail & Related papers (2025-01-23T12:30:04Z)
A Fully Hardware Implemented Accelerator Design in ReRAM Analog Computing without ADCs [5.6496088684920345]
ReRAM-based accelerators process neural networks via analog Computing-in-Memory (CiM) for ultra-high energy efficiency. This work explores the hardware implementation of the Sigmoid and SoftMax activation functions of neural networks with crossbarally binarized neurons. We propose a complete ReRAM-based Analog Computing Accelerator (RACA) that accelerates neural network computation by leveraging inferenceally binarized neurons.
arXiv Detail & Related papers (2024-12-27T09:38:19Z)
Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z)
Quantization of Deep Neural Networks to facilitate self-correction of weights on Phase Change Memory-based analog hardware [0.0]
We develop an algorithm to approximate a set of multiplicative weights. These weights aim to represent the original network's weights with minimal loss in performance. Our results demonstrate that, when paired with an on-chip pulse generator, our self-correcting neural network performs comparably to those trained with analog-aware algorithms.
arXiv Detail & Related papers (2023-09-30T10:47:25Z)
On the Non-Associativity of Analog Computations [0.0]
In this work, we observe that the ordering of input operands of an analog operation also has an impact on the output result. We conduct a simple test by creating a model of a real analog processor which captures such ordering effects. The results prove the existence of ordering effects as well as their high impact, as neglecting ordering results in substantial accuracy drops.
arXiv Detail & Related papers (2023-09-25T17:04:09Z)
A Blueprint for Precise and Fault-Tolerant Analog Neural Networks [1.6039298125810306]
High-precision data converters are costly and impractical for deep neural networks. We address this challenge by using the residue number system (RNS) RNS allows composing high-precision operations from multiple low-precision operations.
arXiv Detail & Related papers (2023-09-19T17:00:34Z)
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. We take an incremental computing approach, looking to reuse calculations as the inputs change. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z)
Biologically Plausible Learning on Neuromorphic Hardware Architectures [27.138481022472]
Neuromorphic computing is an emerging paradigm that confronts this imbalance by computations directly in analog memories. This work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa.
arXiv Detail & Related papers (2022-12-29T15:10:59Z)
ZippyPoint: Fast Interest Point Detection, Description, and Matching through Mixed Precision Discretization [71.91942002659795]
We investigate and adapt network quantization techniques to accelerate inference and enable its use on compute limited platforms. ZippyPoint, our efficient quantized network with binary descriptors, improves the network runtime speed, the descriptor matching speed, and the 3D model size. These improvements come at a minor performance degradation as evaluated on the tasks of homography estimation, visual localization, and map-free visual relocalization.
arXiv Detail & Related papers (2022-03-07T18:59:03Z)
AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator [50.31646817567764]
This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW) We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities. We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
arXiv Detail & Related papers (2021-11-10T10:24:46Z)
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z)
Inference with Artificial Neural Networks on Analog Neuromorphic Hardware [0.0]
BrainScaleS-2 ASIC comprises mixed-signal neurons and synapse circuits. System can also operate in a vector-matrix multiplication and accumulation mode for artificial neural networks.
arXiv Detail & Related papers (2020-06-23T17:25:06Z)
Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features. We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.