Feature diversity in self-supervised learning
- URL: http://arxiv.org/abs/2209.01275v1
- Date: Fri, 2 Sep 2022 21:34:11 GMT
- Title: Feature diversity in self-supervised learning
- Authors: Pranshu Malviya, Arjun Vaithilingam Sudhakar
- Abstract summary: We investigate how these factors may affect overall generalization performance in the context of self-supervised learning with CNN models.
We found that the last layer is the most diversified throughout the training.
While the model's test error decreases with increasing epochs, its diversity drops.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many studies on scaling laws consider basic factors such as model size, model
shape, dataset size, and compute power. These factors are easily tunable and
represent the fundamental elements of any machine learning setup. But
researchers have also employed more complex factors to estimate the test error
and generalization performance with high predictability. These factors are
generally specific to the domain or application. For example, feature diversity
was primarily used for promoting syn-to-real transfer by Chen et al. (2021).
With numerous scaling factors defined in previous works, it would be
interesting to investigate how these factors may affect overall generalization
performance in the context of self-supervised learning with CNN models. How do
individual factors promote generalization, which includes varying depth, width,
or the number of training epochs with early stopping? For example, does higher
feature diversity result in higher accuracy held in complex settings other than
a syn-to-real transfer? How do these factors depend on each other? We found
that the last layer is the most diversified throughout the training. However,
while the model's test error decreases with increasing epochs, its diversity
drops. We also discovered that diversity is directly related to model width.
Related papers
- C-Disentanglement: Discovering Causally-Independent Generative Factors
under an Inductive Bias of Confounder [35.09708249850816]
We introduce a framework entitled Confounded-Disentanglement (C-Disentanglement), the first framework that explicitly introduces the inductive bias of confounder.
We conduct extensive experiments on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-10-26T11:44:42Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Causality Inspired Representation Learning for Domain Generalization [47.574964496891404]
We introduce a general structural causal model to formalize the Domain generalization problem.
Our goal is to extract the causal factors from inputs and then reconstruct the invariant causal mechanisms.
We highlight that ideal causal factors should meet three basic properties: separated from the non-causal ones, jointly independent, and causally sufficient for the classification.
arXiv Detail & Related papers (2022-03-27T08:08:33Z) - Visual Representation Learning Does Not Generalize Strongly Within the
Same Domain [41.66817277929783]
We test whether 17 unsupervised, weakly supervised, and fully supervised representation learning approaches correctly infer the generative factors of variation in simple datasets.
We train and test 2000+ models and observe that all of them struggle to learn the underlying mechanism regardless of supervision signal and architectural bias.
arXiv Detail & Related papers (2021-07-17T11:24:18Z) - Systematic Evaluation of Causal Discovery in Visual Model Based
Reinforcement Learning [76.00395335702572]
A central goal for AI and causality is the joint discovery of abstract representations and causal structure.
Existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs.
In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them.
arXiv Detail & Related papers (2021-07-02T05:44:56Z) - Counterfactual Invariance to Spurious Correlations: Why and How to Pass
Stress Tests [87.60900567941428]
A spurious correlation' is the dependence of a model on some aspect of the input data that an analyst thinks shouldn't matter.
In machine learning, these have a know-it-when-you-see-it character.
We study stress testing using the tools of causal inference.
arXiv Detail & Related papers (2021-05-31T14:39:38Z) - What Should Not Be Contrastive in Contrastive Learning [110.14159883496859]
We introduce a contrastive learning framework which does not require prior knowledge of specific, task-dependent invariances.
Our model learns to capture varying and invariant factors for visual representations by constructing separate embedding spaces.
We use a multi-head network with a shared backbone which captures information across each augmentation and alone outperforms all baselines on downstream tasks.
arXiv Detail & Related papers (2020-08-13T03:02:32Z) - Be Like Water: Robustness to Extraneous Variables Via Adaptive Feature
Normalization [17.829013101192295]
Extraneous variables are variables that are irrelevant for a certain task, but heavily affect the distribution of the available data.
We show that the presence of such variables can degrade the performance of deep-learning models.
We show that estimating the feature statistics adaptively during inference, as in instance normalization, addresses this issue.
arXiv Detail & Related papers (2020-02-10T18:47:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.