Related papers: Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators

Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators

URL: http://arxiv.org/abs/2302.08469v1
Date: Thu, 16 Feb 2023 18:25:06 GMT
Title: Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators
Authors: Malte J. Rasch, Charles Mackin, Manuel Le Gallo, An Chen, Andrea Fasoli, Frederic Odermatt, Ning Li, S. R. Nandakumar, Pritish Narayanan, Hsinyu Tsai, Geoffrey W. Burr, Abu Sebastian, Vijay Narayanan
Abstract summary: We show that many large-scale deep neural networks can be successfully retrained to show iso-accuracy on AIMC. Our results suggest that AIMC nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on DNN accuracy.
Score: 7.152059921639833
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Analog in-memory computing (AIMC) -- a promising approach for energy-efficient acceleration of deep learning workloads -- computes matrix-vector multiplications (MVMs) but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable deep neural network (DNN) inference accuracy as compared to a conventional floating point (FP) implementation. While retraining has previously been suggested to improve robustness, prior work has explored only a few DNN topologies, using disparate and overly simplified AIMC hardware models. Here, we use hardware-aware (HWA) training to systematically examine the accuracy of AIMC for multiple common artificial intelligence (AI) workloads across multiple DNN topologies, and investigate sensitivity and robustness to a broad set of nonidealities. By introducing a new and highly realistic AIMC crossbar-model, we improve significantly on earlier retraining approaches. We show that many large-scale DNNs of various topologies, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers, can in fact be successfully retrained to show iso-accuracy on AIMC. Our results further suggest that AIMC nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on DNN accuracy, and that RNNs are particularly robust to all nonidealities.

Related papers

PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution [0.0]
PERTINENCE is a novel online method designed to analyze the complexity of input features.<n>It dynamically selects the most suitable model from a pre-trained set to process a given input.<n>It achieves better or comparable accuracy with up to 36% fewer operations.
arXiv Detail & Related papers (2025-07-02T13:22:05Z)
A Multi-Fidelity Graph U-Net Model for Accelerated Physics Simulations [1.2430809884830318]
We propose a novel GNN architecture, Multi-Fidelity U-Net, that utilizes the advantages of the multi-fidelity methods for enhancing the performance of the GNN model. We show that the proposed approach performs significantly better in accuracy and data requirement. We also present Multi-Fidelity U-Net Lite, a faster version of the proposed architecture, with 35% faster training, with 2 to 5% reduction in accuracy.
arXiv Detail & Related papers (2024-12-19T20:09:38Z)
Harnessing Nonidealities in Analog In-Memory Computing Circuits: A Physical Modeling Approach for Neuromorphic Systems [5.582327246405357]
In-memory computing (IMC) offers a promising solution by addressing the von Neumann bottleneck inherent in traditional deep learning accelerators. This paper presents a novel approach to directly train physical models of IMC, formulated as ordinary-differential-equation (ODE)-based physical neural networks (PNNs) To enable the training of large-scale networks, we propose a technique called differentiable spike-time discretization (DSTD), which reduces the computational cost of ODE-based PNNs by up to 20 times in speed and 100 times in memory.
arXiv Detail & Related papers (2024-12-12T07:22:23Z)
Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
An Automata-Theoretic Approach to Synthesizing Binarized Neural Networks [13.271286153792058]
Quantized neural networks (QNNs) have been developed, with binarized neural networks (BNNs) restricted to binary values as a special case. This paper presents an automata-theoretic approach to synthesizing BNNs that meet designated properties.
arXiv Detail & Related papers (2023-07-29T06:27:28Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC) We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer. Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z)
Online Training of Spiking Recurrent Neural Networks with Phase-Change Memory Synapses [1.9809266426888898]
Training spiking neural networks (RNNs) on dedicated neuromorphic hardware is still an open challenge. We present a simulation framework of differential-architecture arrays based on an accurate and comprehensive Phase-Change Memory (PCM) device model. We train a spiking RNN whose weights are emulated in the presented simulation framework, using a recently proposed e-prop learning rule.
arXiv Detail & Related papers (2021-08-04T01:24:17Z)
NL-CNN: A Resources-Constrained Deep Learning Model based on Nonlinear Convolution [0.0]
A novel convolution neural network model, abbreviated NL-CNN, is proposed, where nonlinear convolution is emulated in a cascade of convolution + nonlinearity layers. Performance evaluation for several widely known datasets is provided, showing several relevant features.
arXiv Detail & Related papers (2021-01-30T13:38:42Z)
A Meta-Learning Approach to the Optimal Power Flow Problem Under Topology Reconfigurations [69.73803123972297]
We propose a DNN-based OPF predictor that is trained using a meta-learning (MTL) approach. The developed OPF-predictor is validated through simulations using benchmark IEEE bus systems.
arXiv Detail & Related papers (2020-12-21T17:39:51Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar Systems [3.1887081453726136]
crossbar-based computations face a major challenge due to a variety of device and circuit-level non-idealities. We propose TxSim, a fast and customizable modeling framework to functionally evaluate DNN training on crossbar-based hardware.
arXiv Detail & Related papers (2020-02-25T19:29:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.