Layer-Parallel Training of Residual Networks with Auxiliary-Variable
Networks
- URL: http://arxiv.org/abs/2112.05387v1
- Date: Fri, 10 Dec 2021 08:45:35 GMT
- Title: Layer-Parallel Training of Residual Networks with Auxiliary-Variable
Networks
- Authors: Qi Sun, Hexin Dong, Zewei Chen, Jiacheng Sun, Zhenguo Li and Bin Dong
- Abstract summary: auxiliary-variable methods have attracted much interest lately but suffer from significant communication overhead and lack of data augmentation.
We present a novel joint learning framework for training realistic ResNets across multiple compute devices.
We demonstrate the effectiveness of our methods on ResNets and WideResNets across CIFAR-10, CIFAR-100, and ImageNet datasets.
- Score: 28.775355111614484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gradient-based methods for the distributed training of residual networks
(ResNets) typically require a forward pass of the input data, followed by
back-propagating the error gradient to update model parameters, which becomes
time-consuming as the network goes deeper. To break the algorithmic locking and
exploit synchronous module parallelism in both the forward and backward modes,
auxiliary-variable methods have attracted much interest lately but suffer from
significant communication overhead and lack of data augmentation. In this work,
a novel joint learning framework for training realistic ResNets across multiple
compute devices is established by trading off the storage and recomputation of
external auxiliary variables. More specifically, the input data of each
independent processor is generated from its low-capacity auxiliary network
(AuxNet), which permits the use of data augmentation and realizes forward
unlocking. The backward passes are then executed in parallel, each with a local
loss function that originates from the penalty or augmented Lagrangian (AL)
methods. Finally, the proposed AuxNet is employed to reproduce the updated
auxiliary variables through an end-to-end training process. We demonstrate the
effectiveness of our methods on ResNets and WideResNets across CIFAR-10,
CIFAR-100, and ImageNet datasets, achieving speedup over the traditional
layer-serial training method while maintaining comparable testing accuracy.
Related papers
- Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning [19.978542231976636]
This paper proposes a novel method to reduce the parameters and FLOPs for computational efficiency in deep learning models.
We introduce accuracy and efficiency coefficients to control the trade-off between the accuracy of the network and its computing efficiency.
arXiv Detail & Related papers (2023-01-26T12:32:01Z) - Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network.
We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z) - Learning in Feedback-driven Recurrent Spiking Neural Networks using
full-FORCE Training [4.124948554183487]
We propose a supervised training procedure for RSNNs, where a second network is introduced only during the training.
The proposed training procedure consists of generating targets for both recurrent and readout layers.
We demonstrate the improved performance and noise robustness of the proposed full-FORCE training procedure to model 8 dynamical systems.
arXiv Detail & Related papers (2022-05-26T19:01:19Z) - A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching.
We conduct large scale online A/B tests on DiDi's ride-dispatching platform.
Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z) - Implicit recurrent networks: A novel approach to stationary input
processing with recurrent neural networks in deep learning [0.0]
In this work, we introduce and test a novel implementation of recurrent neural networks into deep learning.
We provide an algorithm which implements the backpropagation algorithm on a implicit implementation of recurrent networks.
A single-layer implicit recurrent network is able to solve the XOR problem, while a feed-forward network with monotonically increasing activation function fails at this task.
arXiv Detail & Related papers (2020-10-20T18:55:32Z) - A Practical Layer-Parallel Training Algorithm for Residual Networks [41.267919563145604]
gradient-based algorithms for training ResNets typically require a forward pass of the input data, followed by back-propagating the objective gradient to update parameters.
We propose a novel serial-parallel hybrid training strategy to enable the use of data augmentation, together with downsampling filters to reduce the communication cost.
arXiv Detail & Related papers (2020-09-03T06:03:30Z) - Coded Computing for Federated Learning at the Edge [3.385874614913973]
Federated Learning (FL) enables training a global model from data generated locally at the client nodes, without moving client data to a centralized server.
Recent work proposes to mitigate stragglers and speed up training for linear regression tasks by assigning redundant computations at the MEC server.
We develop CodedFedL that addresses the difficult task of extending CFL to distributed non-linear regression and classification problems with multioutput labels.
arXiv Detail & Related papers (2020-07-07T08:20:47Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.