Reusing Trained Layers of Convolutional Neural Networks to Shorten
Hyperparameters Tuning Time
- URL: http://arxiv.org/abs/2006.09083v2
- Date: Thu, 30 Jul 2020 15:30:27 GMT
- Title: Reusing Trained Layers of Convolutional Neural Networks to Shorten
Hyperparameters Tuning Time
- Authors: Roberto L. Castro, Diego Andrade, Basilio Fraguela
- Abstract summary: This paper describes a proposal to reuse the weights of hidden (convolutional) layers among different trainings to shorten this process.
The experiments compare the training time and the validation loss when reusing and not reusing convolutional layers.
They confirm that this strategy reduces the training time while it even increases the accuracy of the resulting neural network.
- Score: 1.160208922584163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperparameters tuning is a time-consuming approach, particularly when the
architecture of the neural network is decided as part of this process. For
instance, in convolutional neural networks (CNNs), the selection of the number
and the characteristics of the hidden (convolutional) layers may be decided.
This implies that the search process involves the training of all these
candidate network architectures.
This paper describes a proposal to reuse the weights of hidden
(convolutional) layers among different trainings to shorten this process. The
rationale is that if a set of convolutional layers have been trained to solve a
given problem, the weights calculated in this training may be useful when a new
convolutional layer is added to the network architecture.
This idea has been tested using the CIFAR-10 dataset, testing different CNNs
architectures with up to 3 convolutional layers and up to 3 fully connected
layers. The experiments compare the training time and the validation loss when
reusing and not reusing convolutional layers. They confirm that this strategy
reduces the training time while it even increases the accuracy of the resulting
neural network. This finding opens up the future possibility of integrating
this strategy in existing AutoML methods with the purpose of reducing the total
search time.
Related papers
- Time Elastic Neural Networks [2.1756081703276]
We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN)
The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability.
We demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell.
arXiv Detail & Related papers (2024-05-27T09:01:30Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Auto-tuning of Deep Neural Networks by Conflicting Layer Removal [0.0]
We introduce a novel methodology to identify layers that decrease the test accuracy of trained models.
Conflicting layers are detected as early as the beginning of training.
We will show that around 60% of the layers of trained residual networks can be completely removed from the architecture.
arXiv Detail & Related papers (2021-03-07T11:51:55Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - An End-To-End-Trainable Iterative Network Architecture for Accelerated
Radial Multi-Coil 2D Cine MR Image Reconstruction [4.233498905999929]
We propose a CNN-architecture for image reconstruction of accelerated 2D radial cine MRI with multiple receiver coils.
We investigate the proposed training-strategy and compare our method to other well-known reconstruction techniques with learned and non-learned regularization methods.
arXiv Detail & Related papers (2021-02-01T11:42:04Z) - Weight Update Skipping: Reducing Training Time for Artificial Neural
Networks [0.30458514384586394]
We propose a new training methodology for ANNs that exploits the observation of improvement of accuracy shows temporal variations.
During such time windows, we keep updating bias which ensures the network still trains and avoids overfitting.
Such a training approach virtually achieves the same accuracy with considerably less computational cost, thus lower training time.
arXiv Detail & Related papers (2020-12-05T15:12:10Z) - Multi-fidelity Neural Architecture Search with Knowledge Distillation [69.09782590880367]
We propose a bayesian multi-fidelity method for neural architecture search: MF-KD.
Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network.
We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss.
arXiv Detail & Related papers (2020-06-15T12:32:38Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Backprojection for Training Feedforward Neural Networks in the Input and
Feature Spaces [12.323996999894002]
We propose a new algorithm for training feedforward neural networks which is fairly faster than backpropagation.
The proposed algorithm can be used for both input and feature spaces, named as backprojection and kernel backprojection, respectively.
arXiv Detail & Related papers (2020-04-05T20:53:11Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.