Related papers: How Do Training Methods Influence the Utilization of Vision Models?

How Do Training Methods Influence the Utilization of Vision Models?

URL: http://arxiv.org/abs/2410.14470v1
Date: Fri, 18 Oct 2024 13:54:46 GMT
Title: How Do Training Methods Influence the Utilization of Vision Models?
Authors: Paul Gavrikov, Shashank Agnihotri, Margret Keuper, Janis Keuper,
Abstract summary: Not all learnable parameters contribute equally to a neural network's decision function. We revisit earlier studies that examined how architecture and task complexity influence this phenomenon. Our findings reveal that the training method strongly influences which layers become critical to the decision function for a given task.
Score: 23.41975772383921
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Not all learnable parameters (e.g., weights) contribute equally to a neural network's decision function. In fact, entire layers' parameters can sometimes be reset to random values with little to no impact on the model's decisions. We revisit earlier studies that examined how architecture and task complexity influence this phenomenon and ask: is this phenomenon also affected by how we train the model? We conducted experimental evaluations on a diverse set of ImageNet-1k classification models to explore this, keeping the architecture and training data constant but varying the training pipeline. Our findings reveal that the training method strongly influences which layers become critical to the decision function for a given task. For example, improved training regimes and self-supervised training increase the importance of early layers while significantly under-utilizing deeper layers. In contrast, methods such as adversarial training display an opposite trend. Our preliminary results extend previous findings, offering a more nuanced understanding of the inner mechanics of neural networks. Code: https://github.com/paulgavrikov/layer_criticality

Related papers

How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations [69.72654127617058]
Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs) In this work we bring forward empirical evidence that challenges this very notion. We discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer play a crucial role.
arXiv Detail & Related papers (2025-03-01T22:25:11Z)
LayerShuffle: Enhancing Robustness in Vision Transformers by Randomizing Layer Execution Order [10.362659730151591]
We show that vision transformers can adapt to arbitrary layer execution orders at test time. We also show that layers learn to contribute differently based on their position in the network. Our analysis shows that layers learn to contribute differently based on their position in the network.
arXiv Detail & Related papers (2024-07-05T13:54:15Z)
Simple and Effective Transfer Learning for Neuro-Symbolic Integration [50.592338727912946]
A potential solution to this issue is Neuro-Symbolic Integration (NeSy), where neural approaches are combined with symbolic reasoning. Most of these methods exploit a neural network to map perceptions to symbols and a logical reasoner to predict the output of the downstream task. They suffer from several issues, including slow convergence, learning difficulties with complex perception tasks, and convergence to local minima. This paper proposes a simple yet effective method to ameliorate these problems.
arXiv Detail & Related papers (2024-02-21T15:51:01Z)
A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z)
Latent State Models of Training Dynamics [51.88132043461152]
We train models with different random seeds and compute a variety of metrics throughout training. We then fit a hidden Markov model (HMM) over the resulting sequences of metrics. We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
arXiv Detail & Related papers (2023-08-18T13:20:08Z)
Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models. It is most prominently used in federated learning. We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z)
Scaling Laws For Deep Learning Based Image Reconstruction [26.808569077500128]
We study whether major performance gains are expected from scaling up the training set size. An initially steep power-law scaling slows significantly already at moderate training set sizes. We analytically characterize the performance of a linear estimator learned with early stopped gradient descent.
arXiv Detail & Related papers (2022-09-27T14:44:57Z)
Auto-tuning of Deep Neural Networks by Conflicting Layer Removal [0.0]
We introduce a novel methodology to identify layers that decrease the test accuracy of trained models. Conflicting layers are detected as early as the beginning of training. We will show that around 60% of the layers of trained residual networks can be completely removed from the architecture.
arXiv Detail & Related papers (2021-03-07T11:51:55Z)
Efficient Estimation of Influence of a Training Instance [56.29080605123304]
We propose an efficient method for estimating the influence of a training instance on a neural network model. Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance. We demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization.
arXiv Detail & Related papers (2020-12-08T04:31:38Z)
Using Cross-Loss Influence Functions to Explain Deep Network Representations [1.7778609937758327]
We show that influence functions can be extended to handle mismatched training and testing settings. Our result enables us to compute the influence of unsupervised and self-supervised training examples with respect to a supervised test objective.
arXiv Detail & Related papers (2020-12-03T03:43:26Z)
Learning to Rank Learning Curves [15.976034696758148]
We present a new method that saves computational budget by terminating poor configurations early on in the training. We show that our model is able to effectively rank learning curves without having to observe many or very long learning curves.
arXiv Detail & Related papers (2020-06-05T10:49:52Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.