How Do Training Methods Influence the Utilization of Vision Models?
- URL: http://arxiv.org/abs/2410.14470v1
- Date: Fri, 18 Oct 2024 13:54:46 GMT
- Title: How Do Training Methods Influence the Utilization of Vision Models?
- Authors: Paul Gavrikov, Shashank Agnihotri, Margret Keuper, Janis Keuper,
- Abstract summary: Not all learnable parameters contribute equally to a neural network's decision function.
We revisit earlier studies that examined how architecture and task complexity influence this phenomenon.
Our findings reveal that the training method strongly influences which layers become critical to the decision function for a given task.
- Score: 23.41975772383921
- License:
- Abstract: Not all learnable parameters (e.g., weights) contribute equally to a neural network's decision function. In fact, entire layers' parameters can sometimes be reset to random values with little to no impact on the model's decisions. We revisit earlier studies that examined how architecture and task complexity influence this phenomenon and ask: is this phenomenon also affected by how we train the model? We conducted experimental evaluations on a diverse set of ImageNet-1k classification models to explore this, keeping the architecture and training data constant but varying the training pipeline. Our findings reveal that the training method strongly influences which layers become critical to the decision function for a given task. For example, improved training regimes and self-supervised training increase the importance of early layers while significantly under-utilizing deeper layers. In contrast, methods such as adversarial training display an opposite trend. Our preliminary results extend previous findings, offering a more nuanced understanding of the inner mechanics of neural networks. Code: https://github.com/paulgavrikov/layer_criticality
Related papers
- Simple and Effective Transfer Learning for Neuro-Symbolic Integration [50.592338727912946]
A potential solution to this issue is Neuro-Symbolic Integration (NeSy), where neural approaches are combined with symbolic reasoning.
Most of these methods exploit a neural network to map perceptions to symbols and a logical reasoner to predict the output of the downstream task.
They suffer from several issues, including slow convergence, learning difficulties with complex perception tasks, and convergence to local minima.
This paper proposes a simple yet effective method to ameliorate these problems.
arXiv Detail & Related papers (2024-02-21T15:51:01Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Latent State Models of Training Dynamics [51.88132043461152]
We train models with different random seeds and compute a variety of metrics throughout training.
We then fit a hidden Markov model (HMM) over the resulting sequences of metrics.
We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
arXiv Detail & Related papers (2023-08-18T13:20:08Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - Scaling Laws For Deep Learning Based Image Reconstruction [26.808569077500128]
We study whether major performance gains are expected from scaling up the training set size.
An initially steep power-law scaling slows significantly already at moderate training set sizes.
We analytically characterize the performance of a linear estimator learned with early stopped gradient descent.
arXiv Detail & Related papers (2022-09-27T14:44:57Z) - Auto-tuning of Deep Neural Networks by Conflicting Layer Removal [0.0]
We introduce a novel methodology to identify layers that decrease the test accuracy of trained models.
Conflicting layers are detected as early as the beginning of training.
We will show that around 60% of the layers of trained residual networks can be completely removed from the architecture.
arXiv Detail & Related papers (2021-03-07T11:51:55Z) - Efficient Estimation of Influence of a Training Instance [56.29080605123304]
We propose an efficient method for estimating the influence of a training instance on a neural network model.
Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance.
We demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization.
arXiv Detail & Related papers (2020-12-08T04:31:38Z) - Using Cross-Loss Influence Functions to Explain Deep Network
Representations [1.7778609937758327]
We show that influence functions can be extended to handle mismatched training and testing settings.
Our result enables us to compute the influence of unsupervised and self-supervised training examples with respect to a supervised test objective.
arXiv Detail & Related papers (2020-12-03T03:43:26Z) - Learning to Rank Learning Curves [15.976034696758148]
We present a new method that saves computational budget by terminating poor configurations early on in the training.
We show that our model is able to effectively rank learning curves without having to observe many or very long learning curves.
arXiv Detail & Related papers (2020-06-05T10:49:52Z) - The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics.
We find good agreement between our model's predictions and training dynamics in realistic deep learning settings.
We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.