Conflicting Bundles: Adapting Architectures Towards the Improved
Training of Deep Neural Networks
- URL: http://arxiv.org/abs/2011.02956v1
- Date: Thu, 5 Nov 2020 16:41:04 GMT
- Title: Conflicting Bundles: Adapting Architectures Towards the Improved
Training of Deep Neural Networks
- Authors: David Peer, Sebastian Stabinger, Antonio Rodriguez-Sanchez
- Abstract summary: We introduce a novel theory and metric to identify layers that decrease the test accuracy of the trained models.
We identify those layers that worsen the performance because they produce conflicting training bundles.
Based on these findings, a novel algorithm is introduced to remove performance decreasing layers automatically.
- Score: 1.7188280334580195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing neural network architectures is a challenging task and knowing
which specific layers of a model must be adapted to improve the performance is
almost a mystery. In this paper, we introduce a novel theory and metric to
identify layers that decrease the test accuracy of the trained models, this
identification is done as early as at the beginning of training. In the
worst-case, such a layer could lead to a network that can not be trained at
all. More precisely, we identified those layers that worsen the performance
because they produce conflicting training bundles as we show in our novel
theoretical analysis, complemented by our extensive empirical studies. Based on
these findings, a novel algorithm is introduced to remove performance
decreasing layers automatically. Architectures found by this algorithm achieve
a competitive accuracy when compared against the state-of-the-art
architectures. While keeping such high accuracy, our approach drastically
reduces memory consumption and inference time for different computer vision
tasks.
Related papers
- Effective Layer Pruning Through Similarity Metric Perspective [0.0]
Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks.
Pruning structures from these models is a straightforward approach to reducing network complexity.
Layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates.
This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods.
arXiv Detail & Related papers (2024-05-27T11:54:51Z) - RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform
Successive Halving [74.61723678821049]
We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget.
We formulate predictor-based architecture search as learning to rank with pairwise comparisons.
The resulting method - RANK-NOSH, reduces the search budget by 5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.
arXiv Detail & Related papers (2021-08-18T07:45:21Z) - Efficient Neural Architecture Search with Performance Prediction [0.0]
We use a neural architecture search to find the best network architecture for the task at hand.
Existing NAS algorithms generally evaluate the fitness of a new architecture by fully training from scratch.
An end-to-end offline performance predictor is proposed to accelerate the evaluation of sampled architectures.
arXiv Detail & Related papers (2021-08-04T05:44:16Z) - The Untapped Potential of Off-the-Shelf Convolutional Neural Networks [29.205446247063673]
We show that existing off-the-shelf models like ResNet-50 are capable of over 95% accuracy on ImageNet.
This level of performance currently exceeds that of models with over 20x more parameters and significantly more complex training procedures.
arXiv Detail & Related papers (2021-03-17T20:04:46Z) - Auto-tuning of Deep Neural Networks by Conflicting Layer Removal [0.0]
We introduce a novel methodology to identify layers that decrease the test accuracy of trained models.
Conflicting layers are detected as early as the beginning of training.
We will show that around 60% of the layers of trained residual networks can be completely removed from the architecture.
arXiv Detail & Related papers (2021-03-07T11:51:55Z) - The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network
Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks.
We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance.
Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z) - Reusing Trained Layers of Convolutional Neural Networks to Shorten
Hyperparameters Tuning Time [1.160208922584163]
This paper describes a proposal to reuse the weights of hidden (convolutional) layers among different trainings to shorten this process.
The experiments compare the training time and the validation loss when reusing and not reusing convolutional layers.
They confirm that this strategy reduces the training time while it even increases the accuracy of the resulting neural network.
arXiv Detail & Related papers (2020-06-16T11:39:39Z) - Multi-fidelity Neural Architecture Search with Knowledge Distillation [69.09782590880367]
We propose a bayesian multi-fidelity method for neural architecture search: MF-KD.
Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network.
We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss.
arXiv Detail & Related papers (2020-06-15T12:32:38Z) - Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs.
Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances.
In this paper, we tackle this varying depth problem using a steerable architecture.
We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.