Related papers: SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks

SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks

URL: http://arxiv.org/abs/2302.04852v1
Date: Thu, 9 Feb 2023 18:54:05 GMT
Title: SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks
Authors: Mahdi Nikdan, Tommaso Pegolotti, Eugenia Iofinova, Eldar Kurtic, Dan Alistarh
Abstract summary: We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types. We show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch.
Score: 20.18957052535565
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch. Thus, our results provide the first support for sparse training on commodity hardware.

Related papers

QuickNets: Saving Training and Preventing Overconfidence in Early-Exit Neural Architectures [2.28438857884398]
We introduce QuickNets: a novel cascaded training algorithm for faster training of neural networks. We demonstrate that QuickNets can dynamically distribute learning and have a reduced training cost and inference cost compared to standard Backpropagation.
arXiv Detail & Related papers (2022-12-25T07:06:32Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
FastHebb: Scaling Hebbian Training of Deep Neural Networks to ImageNet Level [7.410940271545853]
We present FastHebb, an efficient and scalable solution for Hebbian learning. FastHebb outperforms previous solutions by up to 50 times in terms of training speed. For the first time, we are able to bring Hebbian algorithms to ImageNet scale.
arXiv Detail & Related papers (2022-07-07T09:04:55Z)
Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable. In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z)
FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin. We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z)
Multirate Training of Neural Networks [0.0]
We show that for various transfer learning applications in vision and NLP we can fine-tune deep neural networks in almost half the time. We propose an additional multirate technique which can learn different features present in the data by training the full network on different time scales simultaneously.
arXiv Detail & Related papers (2021-06-20T22:44:55Z)
Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions. Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network. In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z)
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z)
SparseDNN: Fast Sparse Deep Learning Inference on CPUs [1.6244541005112747]
We present SparseDNN, a sparse deep learning inference engine targeting CPUs. We show that our sparse code generator can achieve significant speedups over state-of-the-art sparse and dense libraries.
arXiv Detail & Related papers (2021-01-20T03:27:35Z)
Generalized Leverage Score Sampling for Neural Networks [82.95180314408205]
Leverage score sampling is a powerful technique that originates from theoretical computer science. In this work, we generalize the results in [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17] to a broader class of kernels.
arXiv Detail & Related papers (2020-09-21T14:46:01Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.