Scaling Supervised Local Learning with Augmented Auxiliary Networks
- URL: http://arxiv.org/abs/2402.17318v1
- Date: Tue, 27 Feb 2024 08:50:45 GMT
- Title: Scaling Supervised Local Learning with Augmented Auxiliary Networks
- Authors: Chenxiang Ma, Jibin Wu, Chenyang Si, Kay Chen Tan
- Abstract summary: We present an augmented local learning method dubbed AugLocal for deep neural networks.
We show that AugLocal can effectively scale up to tens of local layers with a comparable accuracy to BP-trained networks.
The proposed AugLocal method opens up a myriad of opportunities for training high-performance deep neural networks on resource-constrained platforms.
- Score: 29.79621064772869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks are typically trained using global error signals that
backpropagate (BP) end-to-end, which is not only biologically implausible but
also suffers from the update locking problem and requires huge memory
consumption. Local learning, which updates each layer independently with a
gradient-isolated auxiliary network, offers a promising alternative to address
the above problems. However, existing local learning methods are confronted
with a large accuracy gap with the BP counterpart, particularly for large-scale
networks. This is due to the weak coupling between local layers and their
subsequent network layers, as there is no gradient communication across layers.
To tackle this issue, we put forward an augmented local learning method, dubbed
AugLocal. AugLocal constructs each hidden layer's auxiliary network by
uniformly selecting a small subset of layers from its subsequent network layers
to enhance their synergy. We also propose to linearly reduce the depth of
auxiliary networks as the hidden layer goes deeper, ensuring sufficient network
capacity while reducing the computational cost of auxiliary networks. Our
extensive experiments on four image classification datasets (i.e., CIFAR-10,
SVHN, STL-10, and ImageNet) demonstrate that AugLocal can effectively scale up
to tens of local layers with a comparable accuracy to BP-trained networks while
reducing GPU memory usage by around 40%. The proposed AugLocal method,
therefore, opens up a myriad of opportunities for training high-performance
deep neural networks on resource-constrained platforms.Code is available at
https://github.com/ChenxiangMA/AugLocal.
Related papers
- Momentum Auxiliary Network for Supervised Local Learning [7.5717621206854275]
Supervised local learning segments the network into multiple local blocks updated by independent auxiliary networks.
We propose a Momentum Auxiliary Network (MAN) that establishes a dynamic interaction mechanism.
Our method can reduce GPU memory usage by more than 45% on the ImageNet dataset compared to end-to-end training.
arXiv Detail & Related papers (2024-07-08T05:31:51Z) - Spatial Bias for Attention-free Non-local Neural Networks [11.320414512937946]
We introduce the spatial bias to learn global knowledge without self-attention in convolutional neural networks.
We show that the spatial bias achieves competitive performance that improves the classification accuracy by +0.79% and +1.5% on ImageNet-1K and cifar100 datasets.
arXiv Detail & Related papers (2023-02-24T08:16:16Z) - Improving the Trainability of Deep Neural Networks through Layerwise
Batch-Entropy Regularization [1.3999481573773072]
We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network.
We show that we can train a "vanilla" fully connected network and convolutional neural network with 500 layers by simply adding the batch-entropy regularization term to the loss function.
arXiv Detail & Related papers (2022-08-01T20:31:58Z) - Layer Folding: Neural Network Depth Reduction using Activation
Linearization [0.0]
Modern devices exhibit a high level of parallelism, but real-time latency is still highly dependent on networks' depth.
We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one.
We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth.
arXiv Detail & Related papers (2021-06-17T08:22:46Z) - All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network.
Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student.
To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z) - Dense Residual Network: Enhancing Global Dense Feature Flow for
Character Recognition [75.4027660840568]
This paper explores how to enhance the local and global dense feature flow by exploiting hierarchical features fully from all the convolution layers.
Technically, we propose an efficient and effective CNN framework, i.e., Fast Dense Residual Network (FDRN) for text recognition.
arXiv Detail & Related papers (2020-01-23T06:55:08Z) - Convolutional Networks with Dense Connectivity [59.30634544498946]
We introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.
For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers.
We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks.
arXiv Detail & Related papers (2020-01-08T06:54:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.