Training Thinner and Deeper Neural Networks: Jumpstart Regularization
- URL: http://arxiv.org/abs/2201.12795v1
- Date: Sun, 30 Jan 2022 12:11:24 GMT
- Title: Training Thinner and Deeper Neural Networks: Jumpstart Regularization
- Authors: Carles Riera and Camilo Rey and Thiago Serra and Eloi Puertas and
Oriol Pujol
- Abstract summary: We use regularization to prevent neurons from dying or becoming linear.
In comparison to conventional training, we obtain neural networks that are thinner, deeper, and - most importantly - more parameter-efficient.
- Score: 2.8348950186890467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks are more expressive when they have multiple layers. In turn,
conventional training methods are only successful if the depth does not lead to
numerical issues such as exploding or vanishing gradients, which occur less
frequently when the layers are sufficiently wide. However, increasing width to
attain greater depth entails the use of heavier computational resources and
leads to overparameterized models. These subsequent issues have been partially
addressed by model compression methods such as quantization and pruning, some
of which relying on normalization-based regularization of the loss function to
make the effect of most parameters negligible. In this work, we propose instead
to use regularization for preventing neurons from dying or becoming linear, a
technique which we denote as jumpstart regularization. In comparison to
conventional training, we obtain neural networks that are thinner, deeper, and
- most importantly - more parameter-efficient.
Related papers
- Self-Expanding Neural Networks [24.812671965904727]
We introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network.
We prove an upper bound on the rate'' at which neurons are added, and a computationally cheap lower bound on the expansion score.
We illustrate the benefits of such Self-Expanding Neural Networks with full connectivity and convolutions in both classification and regression problems.
arXiv Detail & Related papers (2023-07-10T12:49:59Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Regularization-based Pruning of Irrelevant Weights in Deep Neural
Architectures [0.0]
We propose a method for learning sparse neural topologies via a regularization technique which identifies non relevant weights and selectively shrinks their norm.
We tested the proposed technique on different image classification and Natural language generation tasks, obtaining results on par or better then competitors in terms of sparsity and metrics.
arXiv Detail & Related papers (2022-04-11T09:44:16Z) - Improvements to Gradient Descent Methods for Quantum Tensor Network
Machine Learning [0.0]
We introduce a copy node' method that successfully initializes arbitrary tensor networks.
We present numerical results that show that the combination of techniques presented here produces quantum inspired tensor network models.
arXiv Detail & Related papers (2022-03-03T19:00:40Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural
Networks [0.0]
We propose a new pruning method called Pruning for Quantization (PfQ) which removes the filters that disturb the fine-tuning of the DNN.
Experiments using well-known models and datasets confirmed that the proposed method achieves higher performance with a similar model size.
arXiv Detail & Related papers (2020-11-13T04:12:54Z) - Optimization Theory for ReLU Neural Networks Trained with Normalization
Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers.
Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.