Dithered backprop: A sparse and quantized backpropagation algorithm for
more efficient deep neural network training
- URL: http://arxiv.org/abs/2004.04729v2
- Date: Thu, 16 Apr 2020 16:59:09 GMT
- Title: Dithered backprop: A sparse and quantized backpropagation algorithm for
more efficient deep neural network training
- Authors: Simon Wiedemann, Temesgen Mehari, Kevin Kepp, Wojciech Samek
- Abstract summary: We propose a method for reducing the computational cost of backprop, which we named dithered backprop.
We show that our method is fully compatible to state-of-the-art training methods that reduce the bit-precision of training down to 8-bits.
- Score: 18.27946970159625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks are successful but highly computationally expensive
learning systems. One of the main sources of time and energy drains is the well
known backpropagation (backprop) algorithm, which roughly accounts for 2/3 of
the computational complexity of training. In this work we propose a method for
reducing the computational cost of backprop, which we named dithered backprop.
It consists in applying a stochastic quantization scheme to intermediate
results of the method. The particular quantisation scheme, called
non-subtractive dither (NSD), induces sparsity which can be exploited by
computing efficient sparse matrix multiplications. Experiments on popular image
classification tasks show that it induces 92% sparsity on average across a wide
set of models at no or negligible accuracy drop in comparison to
state-of-the-art approaches, thus significantly reducing the computational
complexity of the backward pass. Moreover, we show that our method is fully
compatible to state-of-the-art training methods that reduce the bit-precision
of training down to 8-bits, as such being able to further reduce the
computational requirements. Finally we discuss and show potential benefits of
applying dithered backprop in a distributed training setting, where both
communication as well as compute efficiency may increase simultaneously with
the number of participant nodes.
Related papers
- Efficient Deep Learning with Decorrelated Backpropagation [1.9731499060686393]
We show for the first time that much more efficient training of very deep neural networks using decorrelated backpropagation is feasible.
We obtain a more than two-fold speed-up and higher test accuracy compared to backpropagation when training a 18-layer deep residual network.
arXiv Detail & Related papers (2024-05-03T17:21:13Z) - Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training [30.452060061499523]
We introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation.
Experiments demonstrate the effectiveness of the approximation technique in neural network training.
arXiv Detail & Related papers (2024-03-18T23:23:50Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Ps and Qs: Quantization-aware pruning for efficient low latency neural
network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications.
We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z) - Attentive Gaussian processes for probabilistic time-series generation [4.94950858749529]
We propose a computationally efficient attention-based network combined with the Gaussian process regression to generate real-valued sequence.
We develop a block-wise training algorithm to allow mini-batch training of the network while the GP is trained using full-batch.
The algorithm has been proved to converge and shows comparable, if not better, quality of the found solution.
arXiv Detail & Related papers (2021-02-10T01:19:15Z) - Activation Relaxation: A Local Dynamical Approximation to
Backpropagation in the Brain [62.997667081978825]
Activation Relaxation (AR) is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system.
Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, and can operate on arbitrary computation graphs.
arXiv Detail & Related papers (2020-09-11T11:56:34Z) - Accelerating Neural Network Inference by Overflow Aware Quantization [16.673051600608535]
Inherited heavy computation of deep neural networks prevents their widespread applications.
We propose an overflow aware quantization method by designing trainable adaptive fixed-point representation.
With the proposed method, we are able to fully utilize the computing power to minimize the quantization loss and obtain optimized inference performance.
arXiv Detail & Related papers (2020-05-27T11:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.