Efficient Neural Network Training via Forward and Backward Propagation
Sparsification
- URL: http://arxiv.org/abs/2111.05685v1
- Date: Wed, 10 Nov 2021 13:49:47 GMT
- Title: Efficient Neural Network Training via Forward and Backward Propagation
Sparsification
- Authors: Xiao Zhou, Weizhong Zhang, Zonghao Chen, Shizhe Diao, Tong Zhang
- Abstract summary: We propose an efficient sparse training method with completely sparse forward and backward passes.
Our algorithm is much more effective in accelerating the training process, up to an order of magnitude faster.
- Score: 26.301103403328312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse training is a natural idea to accelerate the training speed of deep
neural networks and save the memory usage, especially since large modern neural
networks are significantly over-parameterized. However, most of the existing
methods cannot achieve this goal in practice because the chain rule based
gradient (w.r.t. structure parameters) estimators adopted by previous methods
require dense computation at least in the backward propagation step. This paper
solves this problem by proposing an efficient sparse training method with
completely sparse forward and backward passes. We first formulate the training
process as a continuous minimization problem under global sparsity constraint.
We then separate the optimization process into two steps, corresponding to
weight update and structure parameter update. For the former step, we use the
conventional chain rule, which can be sparse via exploiting the sparse
structure. For the latter step, instead of using the chain rule based gradient
estimators as in existing methods, we propose a variance reduced policy
gradient estimator, which only requires two forward passes without backward
propagation, thus achieving completely sparse training. We prove that the
variance of our gradient estimator is bounded. Extensive experimental results
on real-world datasets demonstrate that compared to previous methods, our
algorithm is much more effective in accelerating the training process, up to an
order of magnitude faster.
Related papers
- Gradient-Free Training of Recurrent Neural Networks using Random Perturbations [1.1742364055094265]
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities.
Backpropagation through time (BPTT), the prevailing method, extends the backpropagation algorithm by unrolling the RNN over time.
BPTT suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information.
We present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT.
arXiv Detail & Related papers (2024-05-14T21:15:29Z) - Gradient-free neural topology optimization [0.0]
gradient-free algorithms require many more iterations to converge when compared to gradient-based algorithms.
This has made them unviable for topology optimization due to the high computational cost per iteration and high dimensionality of these problems.
We propose a pre-trained neural reparameterization strategy that leads to at least one order of magnitude decrease in iteration count when optimizing the designs in latent space.
arXiv Detail & Related papers (2024-03-07T23:00:49Z) - Efficient Training of Deep Equilibrium Models [6.744714965617125]
Deep equilibrium models (DEQs) have proven to be very powerful for learning data representations.
The idea is to replace traditional (explicit) feedforward neural networks with an implicit fixed-point equation.
Backpropagation through DEQ layers still requires solving an expensive Jacobian-based equation.
arXiv Detail & Related papers (2023-04-23T14:20:09Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Balance is Essence: Accelerating Sparse Training via Adaptive Gradient
Correction [29.61757744974324]
Deep neural networks require significant memory and computation costs.
Sparse training is one of the most common techniques to reduce these costs.
In this work, we aim to overcome this problem and achieve space-time co-efficiency.
arXiv Detail & Related papers (2023-01-09T18:50:03Z) - SHINE: SHaring the INverse Estimate from the forward pass for bi-level
optimization and implicit models [15.541264326378366]
In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks.
The training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix.
We propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer.
arXiv Detail & Related papers (2021-06-01T15:07:34Z) - RNN Training along Locally Optimal Trajectories via Frank-Wolfe
Algorithm [50.76576946099215]
We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region.
We develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation.
arXiv Detail & Related papers (2020-10-12T01:59:18Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.