Auto-tuning of Deep Neural Networks by Conflicting Layer Removal
- URL: http://arxiv.org/abs/2103.04331v1
- Date: Sun, 7 Mar 2021 11:51:55 GMT
- Title: Auto-tuning of Deep Neural Networks by Conflicting Layer Removal
- Authors: David Peer, Sebastian Stabinger, Antonio Rodriguez-Sanchez
- Abstract summary: We introduce a novel methodology to identify layers that decrease the test accuracy of trained models.
Conflicting layers are detected as early as the beginning of training.
We will show that around 60% of the layers of trained residual networks can be completely removed from the architecture.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing neural network architectures is a challenging task and knowing
which specific layers of a model must be adapted to improve the performance is
almost a mystery. In this paper, we introduce a novel methodology to identify
layers that decrease the test accuracy of trained models. Conflicting layers
are detected as early as the beginning of training. In the worst-case scenario,
we prove that such a layer could lead to a network that cannot be trained at
all. A theoretical analysis is provided on what is the origin of those layers
that result in a lower overall network performance, which is complemented by
our extensive empirical evaluation. More precisely, we identified those layers
that worsen the performance because they would produce what we name conflicting
training bundles. We will show that around 60% of the layers of trained
residual networks can be completely removed from the architecture with no
significant increase in the test-error. We will further present a novel
neural-architecture-search (NAS) algorithm that identifies conflicting layers
at the beginning of the training. Architectures found by our auto-tuning
algorithm achieve competitive accuracy values when compared against more
complex state-of-the-art architectures, while drastically reducing memory
consumption and inference time for different computer vision tasks. The source
code is available on https://github.com/peerdavid/conflicting-bundles
Related papers
- Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - Semantic-Based Neural Network Repair [4.092001692194709]
We propose an approach to automatically repair erroneous neural networks.
Our approach is based on an executable semantics of deep learning layers.
We evaluate our approach for two usage scenarios, i.e., repairing automatically generated neural networks and manually written ones suffering from common model bugs.
arXiv Detail & Related papers (2023-06-12T16:18:32Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Stacked unsupervised learning with a network architecture found by
supervised meta-learning [4.209801809583906]
Stacked unsupervised learning seems more biologically plausible than backpropagation.
But SUL has fallen far short of backpropagation in practical applications.
We show an SUL algorithm that can perform completely unsupervised clustering of MNIST digits.
arXiv Detail & Related papers (2022-06-06T16:17:20Z) - Efficient Neural Architecture Search with Performance Prediction [0.0]
We use a neural architecture search to find the best network architecture for the task at hand.
Existing NAS algorithms generally evaluate the fitness of a new architecture by fully training from scratch.
An end-to-end offline performance predictor is proposed to accelerate the evaluation of sampled architectures.
arXiv Detail & Related papers (2021-08-04T05:44:16Z) - Conflicting Bundles: Adapting Architectures Towards the Improved
Training of Deep Neural Networks [1.7188280334580195]
We introduce a novel theory and metric to identify layers that decrease the test accuracy of the trained models.
We identify those layers that worsen the performance because they produce conflicting training bundles.
Based on these findings, a novel algorithm is introduced to remove performance decreasing layers automatically.
arXiv Detail & Related papers (2020-11-05T16:41:04Z) - LoCo: Local Contrastive Representation Learning [93.98029899866866]
We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks.
This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
arXiv Detail & Related papers (2020-08-04T05:41:29Z) - The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network
Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks.
We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance.
Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z) - Reusing Trained Layers of Convolutional Neural Networks to Shorten
Hyperparameters Tuning Time [1.160208922584163]
This paper describes a proposal to reuse the weights of hidden (convolutional) layers among different trainings to shorten this process.
The experiments compare the training time and the validation loss when reusing and not reusing convolutional layers.
They confirm that this strategy reduces the training time while it even increases the accuracy of the resulting neural network.
arXiv Detail & Related papers (2020-06-16T11:39:39Z) - Multi-fidelity Neural Architecture Search with Knowledge Distillation [69.09782590880367]
We propose a bayesian multi-fidelity method for neural architecture search: MF-KD.
Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network.
We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss.
arXiv Detail & Related papers (2020-06-15T12:32:38Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.