Related papers: Efficient Deep Learning with Decorrelated Backpropagation

Efficient Deep Learning with Decorrelated Backpropagation

URL: http://arxiv.org/abs/2405.02385v2
Date: Fri, 17 May 2024 17:13:30 GMT
Title: Efficient Deep Learning with Decorrelated Backpropagation
Authors: Sander Dalm, Joshua Offergeld, Nasir Ahmad, Marcel van Gerven,
Abstract summary: We show for the first time that much more efficient training of very deep neural networks using decorrelated backpropagation is feasible. We obtain a more than two-fold speed-up and higher test accuracy compared to backpropagation when training a 18-layer deep residual network.
Score: 1.9731499060686393
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The backpropagation algorithm remains the dominant and most successful method for training deep neural networks (DNNs). At the same time, training DNNs at scale comes at a significant computational cost and therefore a high carbon footprint. Converging evidence suggests that input decorrelation may speed up deep learning. However, to date, this has not yet translated into substantial improvements in training efficiency in large-scale DNNs. This is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation. Here, we show for the first time that much more efficient training of very deep neural networks using decorrelated backpropagation is feasible. To achieve this goal we made use of a novel algorithm which induces network-wide input decorrelation using minimal computational overhead. By combining this algorithm with careful optimizations, we obtain a more than two-fold speed-up and higher test accuracy compared to backpropagation when training a 18-layer deep residual network. This demonstrates that decorrelation provides exciting prospects for efficient deep learning at scale.

Related papers

Faster Predictive Coding Networks via Better Initialization [52.419343840654186]
We propose a new technique for predictive coding networks that aims to preserve the iterative progress made on previous training samples.<n>Our experiments demonstrate substantial improvements in convergence speed and final test loss in both supervised and unsupervised settings.
arXiv Detail & Related papers (2026-01-28T08:52:19Z)
Estimating Post-Synaptic Effects for Online Training of Feed-Forward SNNs [0.27016900604393124]
Facilitating online learning in spiking neural networks (SNNs) is a key step in developing event-based models. We propose Online Training with Postsynaptic Estimates (OTPE) for training feed-forward SNNs. We show improved scaling for multi-layer networks using a novel approximation of temporal effects on the subsequent layer's activity.
arXiv Detail & Related papers (2023-11-07T16:53:39Z)
A Generalization of Continuous Relaxation in Structured Pruning [0.3277163122167434]
Trends indicate that deeper and larger neural networks with an increasing number of parameters achieve higher accuracy than smaller neural networks. We generalize structured pruning with algorithms for network augmentation, pruning, sub-network collapse and removal. The resulting CNN executes efficiently on GPU hardware without computationally expensive sparse matrix operations.
arXiv Detail & Related papers (2023-08-28T14:19:13Z)
Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems. We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z)
Loss shaping enhances exact gradient learning with Eventprop in spiking neural networks [0.1350479308585481]
Eventprop is an algorithm for gradient descent on exact gradients in spiking neural networks.<n>We implement Eventprop in the GPU-enhanced Neural Networks framework.<n>We train spiking neural networks on Spiking Heidelberg Digits and Spiking Speech Commands datasets.
arXiv Detail & Related papers (2022-12-02T15:20:58Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Navigating Local Minima in Quantized Spiking Neural Networks [3.1351527202068445]
Spiking and Quantized Neural Networks (NNs) are becoming exceedingly important for hyper-efficient implementations of Deep Learning (DL) algorithms. These networks face challenges when trained using error backpropagation, due to the absence of gradient signals when applying hard thresholds. This paper presents a systematic evaluation of a cosine-annealed LR schedule coupled with weight-independent adaptive moment estimation.
arXiv Detail & Related papers (2022-02-15T06:42:25Z)
Efficient Training of Spiking Neural Networks with Temporally-Truncated Local Backpropagation through Time [1.926678651590519]
Training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. This work proposes an efficient and direct training algorithm for SNNs that integrates a locally-supervised training method with a temporally-truncated BPTT algorithm.
arXiv Detail & Related papers (2021-12-13T07:44:58Z)
Training Algorithm Matters for the Performance of Neural Network Potential [4.774810604472842]
We compare the performance of two popular training algorithms, the adaptive moment estimation algorithm (Adam) and the extended Kalman filter algorithm (EKF) It is found that NNPs trained with EKF are more transferable and less sensitive to the value of the learning rate, as compared to Adam.
arXiv Detail & Related papers (2021-09-08T16:48:33Z)
Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks. We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.