Visual Representation Learning Does Not Generalize Strongly Within the
Same Domain
- URL: http://arxiv.org/abs/2107.08221v1
- Date: Sat, 17 Jul 2021 11:24:18 GMT
- Title: Visual Representation Learning Does Not Generalize Strongly Within the
Same Domain
- Authors: Lukas Schott, Julius von K\"ugelgen, Frederik Tr\"auble, Peter Gehler,
Chris Russell, Matthias Bethge, Bernhard Sch\"olkopf, Francesco Locatello,
Wieland Brendel
- Abstract summary: We test whether 17 unsupervised, weakly supervised, and fully supervised representation learning approaches correctly infer the generative factors of variation in simple datasets.
We train and test 2000+ models and observe that all of them struggle to learn the underlying mechanism regardless of supervision signal and architectural bias.
- Score: 41.66817277929783
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An important component for generalization in machine learning is to uncover
underlying latent factors of variation as well as the mechanism through which
each factor acts in the world. In this paper, we test whether 17 unsupervised,
weakly supervised, and fully supervised representation learning approaches
correctly infer the generative factors of variation in simple datasets
(dSprites, Shapes3D, MPI3D). In contrast to prior robustness work that
introduces novel factors of variation during test time, such as blur or other
(un)structured noise, we here recompose, interpolate, or extrapolate only
existing factors of variation from the training data set (e.g., small and
medium-sized objects during training and large objects during testing). Models
that learn the correct mechanism should be able to generalize to this
benchmark. In total, we train and test 2000+ models and observe that all of
them struggle to learn the underlying mechanism regardless of supervision
signal and architectural bias. Moreover, the generalization capabilities of all
tested models drop significantly as we move from artificial datasets towards
more realistic real-world datasets. Despite their inability to identify the
correct mechanism, the models are quite modular as their ability to infer other
in-distribution factors remains fairly stable, providing only a single factor
is out-of-distribution. These results point to an important yet understudied
problem of learning mechanistic models of observations that can facilitate
generalization.
Related papers
- Robust Computer Vision in an Ever-Changing World: A Survey of Techniques
for Tackling Distribution Shifts [20.17397328893533]
AI applications are becoming increasingly visible to the general public.
There is a notable gap between the theoretical assumptions researchers make about computer vision models and the reality those models face when deployed in the real world.
One of the critical reasons for this gap is a challenging problem known as distribution shift.
arXiv Detail & Related papers (2023-12-03T23:40:12Z) - Interpreting and generalizing deep learning in physics-based problems with functional linear models [1.1440052544554358]
Interpretability is crucial and often desired in modeling physical systems.
We present test cases in solid mechanics, fluid mechanics, and transport.
Our study underscores the significance of interpretable representation in scientific machine learning.
arXiv Detail & Related papers (2023-07-10T14:01:29Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Towards Understanding How Data Augmentation Works with Imbalanced Data [17.478900028887537]
We study the effect of data augmentation on three different classifiers, convolutional neural networks, support vector machines, and logistic regression models.
Our research indicates that DA, when applied to imbalanced data, produces substantial changes in model weights, support vectors and feature selection.
We hypothesize that DA works by facilitating variances in data, so that machine learning models can associate changes in the data with labels.
arXiv Detail & Related papers (2023-04-12T15:01:22Z) - Feature diversity in self-supervised learning [0.0]
We investigate how these factors may affect overall generalization performance in the context of self-supervised learning with CNN models.
We found that the last layer is the most diversified throughout the training.
While the model's test error decreases with increasing epochs, its diversity drops.
arXiv Detail & Related papers (2022-09-02T21:34:11Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Nonlinear Invariant Risk Minimization: A Causal Approach [5.63479133344366]
We propose a learning paradigm that enables out-of-distribution generalization in the nonlinear setting.
We show identifiability of the data representation up to very simple transformations.
Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.
arXiv Detail & Related papers (2021-02-24T15:38:41Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.