What Variables Affect Out-of-Distribution Generalization in Pretrained Models?
- URL: http://arxiv.org/abs/2405.15018v3
- Date: Fri, 25 Oct 2024 14:14:39 GMT
- Title: What Variables Affect Out-of-Distribution Generalization in Pretrained Models?
- Authors: Md Yousuf Harun, Kyungbok Lee, Jhair Gallardo, Giri Krishnan, Christopher Kanan,
- Abstract summary: Embeddings produced by pre-trained deep neural networks (DNNs) are widely used, but their efficacy for downstream tasks can vary widely.
We study the factors influencing transferability and out-of-distribution generalization of pre-trained DNN embeddings.
- Score: 15.047920317548128
- License:
- Abstract: Embeddings produced by pre-trained deep neural networks (DNNs) are widely used; however, their efficacy for downstream tasks can vary widely. We study the factors influencing transferability and out-of-distribution (OOD) generalization of pre-trained DNN embeddings through the lens of the tunnel effect hypothesis, which is closely related to intermediate neural collapse. This hypothesis suggests that deeper DNN layers compress representations and hinder OOD generalization. Contrary to earlier work, our experiments show this is not a universal phenomenon. We comprehensively investigate the impact of DNN architecture, training data, image resolution, and augmentations on transferability. We identify that training with high-resolution datasets containing many classes greatly reduces representation compression and improves transferability. Our results emphasize the danger of generalizing findings from toy datasets to broader contexts.
Related papers
- Causal inference through multi-stage learning and doubly robust deep neural networks [10.021381302215062]
Deep neural networks (DNNs) have demonstrated remarkable empirical performance in large-scale supervised learning problems.
This study delves into the application of DNNs across a wide spectrum of intricate causal inference tasks.
arXiv Detail & Related papers (2024-07-11T14:47:44Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Common Knowledge Learning for Generating Transferable Adversarial
Examples [60.1287733223249]
This paper focuses on an important type of black-box attacks, where the adversary generates adversarial examples by a substitute (source) model.
Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures.
We propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples.
arXiv Detail & Related papers (2023-07-01T09:07:12Z) - Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - Transferability of coVariance Neural Networks and Application to
Interpretable Brain Age Prediction using Anatomical Features [119.45320143101381]
Graph convolutional networks (GCN) leverage topology-driven graph convolutional operations to combine information across the graph for inference tasks.
We have studied GCNs with covariance matrices as graphs in the form of coVariance neural networks (VNNs)
VNNs inherit the scale-free data processing architecture from GCNs and here, we show that VNNs exhibit transferability of performance over datasets whose covariance matrices converge to a limit object.
arXiv Detail & Related papers (2023-05-02T22:15:54Z) - Frozen Overparameterization: A Double Descent Perspective on Transfer
Learning of Deep Neural Networks [27.17697714584768]
We study the generalization behavior of transfer learning of deep neural networks (DNNs)
We show that the test error evolution during the target training has a more significant double descent effect when the target training dataset is sufficiently large.
Also, we show that the double descent phenomenon may make a transfer from a less related source task better than a transfer from a more related source task.
arXiv Detail & Related papers (2022-11-20T20:26:23Z) - Do Deep Neural Networks Always Perform Better When Eating More Data? [82.6459747000664]
We design experiments from Identically Independent Distribution(IID) and Out of Distribution(OOD)
Under IID condition, the amount of information determines the effectivity of each sample, the contribution of samples and difference between classes determine the amount of class information.
Under OOD condition, the cross-domain degree of samples determine the contributions, and the bias-fitting caused by irrelevant elements is a significant factor of cross-domain.
arXiv Detail & Related papers (2022-05-30T15:40:33Z) - Improving deep neural network generalization and robustness to
background bias via layer-wise relevance propagation optimization [0.0]
Features in images' backgrounds can spuriously correlate with the images' classes, representing background bias.
Deep neural networks (DNNs) that perform well on standard evaluation datasets but generalize poorly to real-world data.
We show that the optimization of LRP heatmaps can minimize the background bias influence on deep classifiers.
arXiv Detail & Related papers (2022-02-01T05:58:01Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Examining the causal structures of deep neural networks using
information theory [0.0]
Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets.
DNNs can also be examined at the level of causation, exploring "what does what" within the layers of the network itself.
Here, we introduce a suite of metrics based on information theory to quantify and track changes in the causal structure of DNNs during training.
arXiv Detail & Related papers (2020-10-26T19:53:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.