Related papers: LayerDropBack: A Universally Applicable Approach for Accelerating Training of Deep Networks

LayerDropBack: A Universally Applicable Approach for Accelerating Training of Deep Networks

URL: http://arxiv.org/abs/2412.18027v1
Date: Mon, 23 Dec 2024 22:39:41 GMT
Title: LayerDropBack: A Universally Applicable Approach for Accelerating Training of Deep Networks
Authors: Evgeny Hershkovitch Neiterman, Gil Ben-Artzi,
Abstract summary: We introduce LayerDropBack (LDB), a simple yet effective method to accelerate training across a wide range of deep networks.<n>LDB introduces randomness only in the backward pass, maintaining the integrity of the forward pass.<n>Our experiments show significant training time reductions of 16.93% to 23.97%, while preserving or even enhancing model accuracy.
Score: 5.00301731167245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training very deep convolutional networks is challenging, requiring significant computational resources and time. Existing acceleration methods often depend on specific architectures or require network modifications. We introduce LayerDropBack (LDB), a simple yet effective method to accelerate training across a wide range of deep networks. LDB introduces randomness only in the backward pass, maintaining the integrity of the forward pass, guaranteeing that the same network is used during both training and inference. LDB can be seamlessly integrated into the training process of any model without altering its architecture, making it suitable for various network topologies. Our extensive experiments across multiple architectures (ViT, Swin Transformer, EfficientNet, DLA) and datasets (CIFAR-100, ImageNet) show significant training time reductions of 16.93\% to 23.97\%, while preserving or even enhancing model accuracy. Code is available at \url{https://github.com/neiterman21/LDB}.

Related papers

ChannelDropBack: Forward-Consistent Stochastic Regularization for Deep Networks [5.00301731167245]
Existing techniques often require modifying the architecture of the network by adding specialized layers. We present ChannelDropBack, a simple regularization approach that introduces randomness only into the backward information flow. It allows for seamless integration into the training process of any model and layers without the need to change its architecture.
arXiv Detail & Related papers (2024-11-16T21:24:44Z)
Toward efficient resource utilization at edge nodes in federated learning [0.6990493129893112]
Federated learning enables edge nodes to collaboratively contribute to constructing a global model without sharing their data. computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications. We propose and evaluate a FL strategy inspired by transfer learning in order to reduce resource utilization on devices.
arXiv Detail & Related papers (2023-09-19T07:04:50Z)
Efficient Implementation of a Multi-Layer Gradient-Free Online-Trainable Spiking Neural Network on FPGA [0.31498833540989407]
ODESA is the first network to have end-to-end multi-layer online local supervised training without using gradients. This research shows that the network architecture and the online training of weights and thresholds can be implemented efficiently on a large scale in hardware.
arXiv Detail & Related papers (2023-05-31T00:34:15Z)
Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable. In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z)
Learning Fast and Slow for Online Time Series Forecasting [76.50127663309604]
Fast and Slow learning Networks (FSNet) is a holistic framework for online time-series forecasting. FSNet balances fast adaptation to recent changes and retrieving similar old knowledge. Our code will be made publicly available.
arXiv Detail & Related papers (2022-02-23T18:23:07Z)
Layer-Parallel Training of Residual Networks with Auxiliary-Variable Networks [28.775355111614484]
auxiliary-variable methods have attracted much interest lately but suffer from significant communication overhead and lack of data augmentation. We present a novel joint learning framework for training realistic ResNets across multiple compute devices. We demonstrate the effectiveness of our methods on ResNets and WideResNets across CIFAR-10, CIFAR-100, and ImageNet datasets.
arXiv Detail & Related papers (2021-12-10T08:45:35Z)
Improving the Accuracy of Early Exits in Multi-Exit Architectures via Curriculum Learning [88.17413955380262]
Multi-exit architectures allow deep neural networks to terminate their execution early in order to adhere to tight deadlines at the cost of accuracy. We introduce a novel method called Multi-Exit Curriculum Learning that utilizes curriculum learning. Our method consistently improves the accuracy of early exits compared to the standard training approach.
arXiv Detail & Related papers (2021-04-21T11:12:35Z)
All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network. Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student. To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z)
Embedded Knowledge Distillation in Depth-level Dynamic Neural Network [8.207403859762044]
We propose an elegant Depth-level Dynamic Neural Network (DDNN) integrated different-depth sub-nets of similar architectures. In this article, we design the Embedded-Knowledge-Distillation (EKD) training mechanism for the DDNN to implement semantic knowledge transfer from the teacher (full) net to multiple sub-nets. Experiments on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that sub-nets in DDNN with EKD training achieves better performance than the depth-level pruning or individually training.
arXiv Detail & Related papers (2021-03-01T06:35:31Z)
Go Wide, Then Narrow: Efficient Training of Deep Thin Networks [62.26044348366186]
We propose an efficient method to train a deep thin network with a theoretic guarantee. By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large.
arXiv Detail & Related papers (2020-07-01T23:34:35Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.