Stimulative Training++: Go Beyond The Performance Limits of Residual
Networks
- URL: http://arxiv.org/abs/2305.02507v1
- Date: Thu, 4 May 2023 02:38:11 GMT
- Title: Stimulative Training++: Go Beyond The Performance Limits of Residual
Networks
- Authors: Peng Ye, Tong He, Shengji Tang, Baopu Li, Tao Chen, Lei Bai, Wanli
Ouyang
- Abstract summary: Residual networks have shown great success and become indispensable in recent deep neural network models.
Previous research has suggested that residual networks can be considered as ensembles of shallow networks.
We identify a problem that is analogous to social loafing, whereworks within a residual network are prone to exert less effort when working as part of a group compared to working alone.
- Score: 91.5381301894899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Residual networks have shown great success and become indispensable in recent
deep neural network models. In this work, we aim to re-investigate the training
process of residual networks from a novel social psychology perspective of
loafing, and further propose a new training scheme as well as three improved
strategies for boosting residual networks beyond their performance limits.
Previous research has suggested that residual networks can be considered as
ensembles of shallow networks, which implies that the final performance of a
residual network is influenced by a group of subnetworks. We identify a
previously overlooked problem that is analogous to social loafing, where
subnetworks within a residual network are prone to exert less effort when
working as part of a group compared to working alone. We define this problem as
\textit{network loafing}. Similar to the decreased individual productivity and
overall performance as demonstrated in society, network loafing inevitably
causes sub-par performance. Inspired by solutions from social psychology, we
first propose a novel training scheme called stimulative training, which
randomly samples a residual subnetwork and calculates the KL divergence loss
between the sampled subnetwork and the given residual network for extra
supervision. In order to unleash the potential of stimulative training, we
further propose three simple-yet-effective strategies, including a novel KL-
loss that only aligns the network logits direction, random smaller inputs for
subnetworks, and inter-stage sampling rules. Comprehensive experiments and
analysis verify the effectiveness of stimulative training as well as its three
improved strategies.
Related papers
- Efficient Stagewise Pretraining via Progressive Subnetworks [53.00045381931778]
The prevailing view suggests that stagewise dropping strategies, such as layer dropping, are ineffective when compared to stacking-based approaches.
This paper challenges this notion by demonstrating that, with proper design, dropping strategies can be competitive, if not better, than stacking methods.
We propose an instantiation of this framework - Random Part Training (RAPTR) - that selects and trains only a random subnetwork at each step, progressively increasing the size in stages.
arXiv Detail & Related papers (2024-02-08T18:49:09Z) - Adaptive Depth Networks with Skippable Sub-Paths [1.8416014644193066]
We present a practical approach to adaptive depth networks with minimal training effort.
Our approach does not train every target sub-network in an iterative manner.
We provide a formal rationale for why the proposed training method can reduce overall prediction errors.
arXiv Detail & Related papers (2023-12-27T03:43:38Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Stimulative Training of Residual Networks: A Social Psychology
Perspective of Loafing [86.69698062642055]
Residual networks have shown great success and become indispensable in today's deep models.
We aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing.
We propose a new training strategy to strengthen the performance of residual networks.
arXiv Detail & Related papers (2022-10-09T03:15:51Z) - Growing Neural Network with Shared Parameter [0.0]
We propose a general method for growing neural network with shared parameter by matching trained network to new input.
Our method has shown the ability to improve performance with higher parameter efficiency.
It can also be applied to trans-task case and realize transfer learning by changing the combination ofworks without training on new task.
arXiv Detail & Related papers (2022-01-17T16:24:17Z) - Sparsity in Deep Learning: Pruning and growth for efficient inference
and training in neural networks [78.47459801017959]
Sparsity can reduce the memory footprint of regular networks to fit mobile devices.
We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice.
arXiv Detail & Related papers (2021-01-31T22:48:50Z) - Activation function impact on Sparse Neural Networks [0.0]
Sparse Evolutionary Training allows for significantly lower computational complexity when compared to fully connected models.
This research provides insights into the relationship between the activation function used and the network performance at various sparsity levels.
arXiv Detail & Related papers (2020-10-12T18:05:04Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable
Optimization Via Overparameterization From Depth [19.866928507243617]
Training deep neural networks with gradient descent (SGD) can often achieve zero training loss on real-world landscapes.
We propose a new limit of infinity deep residual networks, which enjoys a good training in the sense that everyr is global.
arXiv Detail & Related papers (2020-03-11T20:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.