Related papers: Conflicting Bundles: Adapting Architectures Towards the Improved Training of Deep Neural Networks

Conflicting Bundles: Adapting Architectures Towards the Improved Training of Deep Neural Networks

URL: http://arxiv.org/abs/2011.02956v1
Date: Thu, 5 Nov 2020 16:41:04 GMT
Title: Conflicting Bundles: Adapting Architectures Towards the Improved Training of Deep Neural Networks
Authors: David Peer, Sebastian Stabinger, Antonio Rodriguez-Sanchez
Abstract summary: We introduce a novel theory and metric to identify layers that decrease the test accuracy of the trained models. We identify those layers that worsen the performance because they produce conflicting training bundles. Based on these findings, a novel algorithm is introduced to remove performance decreasing layers automatically.
Score: 1.7188280334580195
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Designing neural network architectures is a challenging task and knowing which specific layers of a model must be adapted to improve the performance is almost a mystery. In this paper, we introduce a novel theory and metric to identify layers that decrease the test accuracy of the trained models, this identification is done as early as at the beginning of training. In the worst-case, such a layer could lead to a network that can not be trained at all. More precisely, we identified those layers that worsen the performance because they produce conflicting training bundles as we show in our novel theoretical analysis, complemented by our extensive empirical studies. Based on these findings, a novel algorithm is introduced to remove performance decreasing layers automatically. Architectures found by this algorithm achieve a competitive accuracy when compared against the state-of-the-art architectures. While keeping such high accuracy, our approach drastically reduces memory consumption and inference time for different computer vision tasks.

Related papers

ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization [0.32885740436059047]
Over- parameterized neural network models often lead to significant performance discrepancies between training and test sets. We introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets. We propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set.
arXiv Detail & Related papers (2024-12-02T13:21:31Z)
Effective Layer Pruning Through Similarity Metric Perspective [0.0]
Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks. Pruning structures from these models is a straightforward approach to reducing network complexity. Layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates. This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods.
arXiv Detail & Related papers (2024-05-27T11:54:51Z)
RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving [74.61723678821049]
We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget. We formulate predictor-based architecture search as learning to rank with pairwise comparisons. The resulting method - RANK-NOSH, reduces the search budget by 5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.
arXiv Detail & Related papers (2021-08-18T07:45:21Z)
Efficient Neural Architecture Search with Performance Prediction [0.0]
We use a neural architecture search to find the best network architecture for the task at hand. Existing NAS algorithms generally evaluate the fitness of a new architecture by fully training from scratch. An end-to-end offline performance predictor is proposed to accelerate the evaluation of sampled architectures.
arXiv Detail & Related papers (2021-08-04T05:44:16Z)
The Untapped Potential of Off-the-Shelf Convolutional Neural Networks [29.205446247063673]
We show that existing off-the-shelf models like ResNet-50 are capable of over 95% accuracy on ImageNet. This level of performance currently exceeds that of models with over 20x more parameters and significantly more complex training procedures.
arXiv Detail & Related papers (2021-03-17T20:04:46Z)
Auto-tuning of Deep Neural Networks by Conflicting Layer Removal [0.0]
We introduce a novel methodology to identify layers that decrease the test accuracy of trained models. Conflicting layers are detected as early as the beginning of training. We will show that around 60% of the layers of trained residual networks can be completely removed from the architecture.
arXiv Detail & Related papers (2021-03-07T11:51:55Z)
The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks. We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance. Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z)
Reusing Trained Layers of Convolutional Neural Networks to Shorten Hyperparameters Tuning Time [1.160208922584163]
This paper describes a proposal to reuse the weights of hidden (convolutional) layers among different trainings to shorten this process. The experiments compare the training time and the validation loss when reusing and not reusing convolutional layers. They confirm that this strategy reduces the training time while it even increases the accuracy of the resulting neural network.
arXiv Detail & Related papers (2020-06-16T11:39:39Z)
Multi-fidelity Neural Architecture Search with Knowledge Distillation [69.09782590880367]
We propose a bayesian multi-fidelity method for neural architecture search: MF-KD. Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network. We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss.
arXiv Detail & Related papers (2020-06-15T12:32:38Z)
Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs. Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances. In this paper, we tackle this varying depth problem using a steerable architecture. We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.