ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training
and Inference
- URL: http://arxiv.org/abs/2209.04161v2
- Date: Tue, 13 Sep 2022 05:11:41 GMT
- Title: ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training
and Inference
- Authors: Jing Gong, Hassaan Saadat, Hasindu Gamaarachchi, Haris Javaid, Xiaobo
Sharon Hu, Sri Parameswaran
- Abstract summary: Hardware approximates have shown their effectiveness for gaining resource-efficiency in inference accelerators.
This paper presents ApproxTrain, an open-source framework that allows fast evaluation of training inference using simulated approximate multipliers.
- Score: 4.386709201336175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Edge training of Deep Neural Networks (DNNs) is a desirable goal for
continuous learning; however, it is hindered by the enormous computational
power required by training. Hardware approximate multipliers have shown their
effectiveness for gaining resource-efficiency in DNN inference accelerators;
however, training with approximate multipliers is largely unexplored. To build
resource efficient accelerators with approximate multipliers supporting DNN
training, a thorough evaluation of training convergence and accuracy for
different DNN architectures and different approximate multipliers is needed.
This paper presents ApproxTrain, an open-source framework that allows fast
evaluation of DNN training and inference using simulated approximate
multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires
only a high-level description of a DNN architecture along with C/C++ functional
models of the approximate multiplier. We improve the speed of the simulation at
the multiplier level by using a novel LUT-based approximate floating-point (FP)
multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently
integrates AMSim into the TensorFlow library, in order to overcome the absence
of native hardware approximate multiplier in commercial GPUs. We use
ApproxTrain to evaluate the convergence and accuracy of DNN training with
approximate multipliers for small and large datasets (including ImageNet) using
LeNets and ResNets architectures. The evaluations demonstrate similar
convergence behavior and negligible change in test accuracy compared to FP32
and bfloat16 multipliers. Compared to CPU-based approximate multiplier
simulations in training and inference, the GPU-accelerated ApproxTrain is more
than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS
libraries with native hardware multipliers, the original TensorFlow is only 8x
faster than ApproxTrain.
Related papers
- Harnessing Manycore Processors with Distributed Memory for Accelerated
Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures.
We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z) - Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and
Dataflow Co-Design [15.47240906902083]
This paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design.
At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights.
At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to support both the regular dense operations and the computation-efficient N:M sparse operations.
arXiv Detail & Related papers (2023-09-22T17:26:19Z) - Tensor Slicing and Optimization for Multicore NPUs [2.670309629218727]
This paper proposes a compiler optimization pass for Multicore NPUs, called Slicing Optimization (TSO)
TSO identifies the best tensor slicing that minimizes execution time for a set of CNN models.
Results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models.
arXiv Detail & Related papers (2023-04-06T12:03:03Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - AdaPT: Fast Emulation of Approximate DNN Accelerators in PyTorch [4.445835362642506]
We present AdaPT, a fast emulation framework that extends PyTorch to support approximate inference and approximation-aware retraining.
We evaluate the framework on several DNN models and application fields including CNNs, LSTMs, and GANs for a number of approximate multipliers with distinct bitwidth values.
The results show substantial error recovery from approximate re-training and reduced inference time up to 53.9x with respect to the baseline approximate implementation.
arXiv Detail & Related papers (2022-03-08T13:31:16Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - Large Batch Simulation for Deep Reinforcement Learning [101.01408262583378]
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work.
We realize end-to-end training speeds of over 19,000 frames of experience per second on a single and up to 72,000 frames per second on a single eight- GPU machine.
By combining batch simulation and performance optimizations, we demonstrate that Point navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system.
arXiv Detail & Related papers (2021-03-12T00:22:50Z) - A Meta-Learning Approach to the Optimal Power Flow Problem Under
Topology Reconfigurations [69.73803123972297]
We propose a DNN-based OPF predictor that is trained using a meta-learning (MTL) approach.
The developed OPF-predictor is validated through simulations using benchmark IEEE bus systems.
arXiv Detail & Related papers (2020-12-21T17:39:51Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar
Systems [3.1887081453726136]
crossbar-based computations face a major challenge due to a variety of device and circuit-level non-idealities.
We propose TxSim, a fast and customizable modeling framework to functionally evaluate DNN training on crossbar-based hardware.
arXiv Detail & Related papers (2020-02-25T19:29:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.