A Theoretical Study of Inductive Biases in Contrastive Learning
- URL: http://arxiv.org/abs/2211.14699v2
- Date: Sat, 8 Apr 2023 05:10:00 GMT
- Title: A Theoretical Study of Inductive Biases in Contrastive Learning
- Authors: Jeff Z. HaoChen, Tengyu Ma
- Abstract summary: We provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class.
We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture.
- Score: 32.98250585760665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding self-supervised learning is important but challenging. Previous
theoretical works study the role of pretraining losses, and view neural
networks as general black boxes. However, the recent work of Saunshi et al.
argues that the model architecture -- a component largely ignored by previous
works -- also has significant influences on the downstream performance of
self-supervised learning. In this work, we provide the first theoretical
analysis of self-supervised learning that incorporates the effect of inductive
biases originating from the model class. In particular, we focus on contrastive
learning -- a popular self-supervised learning method that is widely used in
the vision domain. We show that when the model has limited capacity,
contrastive representations would recover certain special clustering structures
that are compatible with the model architecture, but ignore many other
clustering structures in the data distribution. As a result, our theory can
capture the more realistic setting where contrastive representations have much
lower dimensionality than the number of clusters in the data distribution. We
instantiate our theory on several synthetic data distributions, and provide
empirical evidence to support the theory.
Related papers
- Cross-Entropy Is All You Need To Invert the Data Generating Process [29.94396019742267]
Empirical phenomena suggest that supervised models can learn interpretable factors of variation in a linear fashion.
Recent advances in self-supervised learning have shown that these methods can recover latent structures by inverting the data generating process.
We prove that even in standard classification tasks, models learn representations of ground-truth factors of variation up to a linear transformation.
arXiv Detail & Related papers (2024-10-29T09:03:57Z) - An Effective Theory of Bias Amplification [18.648588509429167]
Machine learning models may capture and amplify biases present in data, leading to disparate test performance across social groups.
We propose a precise analytical theory in the context of ridge regression, where the former models neural networks in a simplified regime.
Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias.
arXiv Detail & Related papers (2024-10-07T08:43:22Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.
We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.
We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Learn to Accumulate Evidence from All Training Samples: Theory and
Practice [7.257751371276488]
Evidential deep learning offers a principled and computationally efficient way to turn a deterministic neural network uncertainty-aware.
Existing evidential activation functions create zero evidence regions, which prevent the model to learn from training samples falling into such regions.
A deeper analysis of evidential activation functions based on our theoretical underpinning inspires the design of a novel regularizer.
arXiv Detail & Related papers (2023-06-19T18:27:12Z) - On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features.
Based on these observations, we propose a conceptual framework for feature learning.
Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z) - Causal Reasoning Meets Visual Representation Learning: A Prospective
Study [117.08431221482638]
Lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models.
Inspired by the strong inference ability of human-level agents, recent years have witnessed great effort in developing causal reasoning paradigms.
This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods.
arXiv Detail & Related papers (2022-04-26T02:22:28Z) - The Power of Contrast for Feature Learning: A Theoretical Analysis [42.20116348668721]
We show that contrastive learning outperforms the standard autoencoders and generative adversarial networks.
We also illustrate the impact of labeled data in supervised contrastive learning.
arXiv Detail & Related papers (2021-10-06T03:10:28Z) - Contrastive Learning Inverts the Data Generating Process [36.30995987986073]
We prove that feedforward models trained with objectives belonging to the commonly used InfoNCE family learn to implicitly invert the underlying generative model of the observed data.
Our theory highlights a fundamental connection between contrastive learning, generative modeling, and nonlinear independent component analysis.
arXiv Detail & Related papers (2021-02-17T16:21:54Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.