A Theoretical Study of Inductive Biases in Contrastive Learning
- URL: http://arxiv.org/abs/2211.14699v2
- Date: Sat, 8 Apr 2023 05:10:00 GMT
- Title: A Theoretical Study of Inductive Biases in Contrastive Learning
- Authors: Jeff Z. HaoChen, Tengyu Ma
- Abstract summary: We provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class.
We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture.
- Score: 32.98250585760665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding self-supervised learning is important but challenging. Previous
theoretical works study the role of pretraining losses, and view neural
networks as general black boxes. However, the recent work of Saunshi et al.
argues that the model architecture -- a component largely ignored by previous
works -- also has significant influences on the downstream performance of
self-supervised learning. In this work, we provide the first theoretical
analysis of self-supervised learning that incorporates the effect of inductive
biases originating from the model class. In particular, we focus on contrastive
learning -- a popular self-supervised learning method that is widely used in
the vision domain. We show that when the model has limited capacity,
contrastive representations would recover certain special clustering structures
that are compatible with the model architecture, but ignore many other
clustering structures in the data distribution. As a result, our theory can
capture the more realistic setting where contrastive representations have much
lower dimensionality than the number of clusters in the data distribution. We
instantiate our theory on several synthetic data distributions, and provide
empirical evidence to support the theory.
Related papers
- Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective [60.64922606733441]
We introduce a mathematical model that formalizes relational learning as hypergraph recovery to study pre-training of Foundation Models (FMs)
In our framework, the world is represented as a hypergraph, with data abstracted as random samples from hyperedges. We theoretically examine the feasibility of a Pre-Trained Model (PTM) to recover this hypergraph and analyze the data efficiency in a minimax near-optimal style.
arXiv Detail & Related papers (2024-06-17T06:20:39Z) - Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.
We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.
We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Specify Robust Causal Representation from Mixed Observations [35.387451486213344]
Learning representations purely from observations concerns the problem of learning a low-dimensional, compact representation which is beneficial to prediction models.
We develop a learning method to learn such representation from observational data by regularizing the learning procedure with mutual information measures.
We theoretically and empirically show that the models trained with the learned causal representations are more robust under adversarial attacks and distribution shifts.
arXiv Detail & Related papers (2023-10-21T02:18:35Z) - Learn to Accumulate Evidence from All Training Samples: Theory and
Practice [7.257751371276488]
Evidential deep learning offers a principled and computationally efficient way to turn a deterministic neural network uncertainty-aware.
Existing evidential activation functions create zero evidence regions, which prevent the model to learn from training samples falling into such regions.
A deeper analysis of evidential activation functions based on our theoretical underpinning inspires the design of a novel regularizer.
arXiv Detail & Related papers (2023-06-19T18:27:12Z) - On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features.
Based on these observations, we propose a conceptual framework for feature learning.
Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z) - Theory on Forgetting and Generalization of Continual Learning [41.85538120246877]
Continual learning (CL) aims to learn a sequence of tasks.
There is a lack of understanding on what factors are important and how they affect "catastrophic forgetting" and generalization performance.
We show that our results not only explain some interesting empirical observations in recent studies, but also motivate better practical algorithm designs of CL.
arXiv Detail & Related papers (2023-02-12T02:14:14Z) - Causal Reasoning Meets Visual Representation Learning: A Prospective
Study [117.08431221482638]
Lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models.
Inspired by the strong inference ability of human-level agents, recent years have witnessed great effort in developing causal reasoning paradigms.
This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods.
arXiv Detail & Related papers (2022-04-26T02:22:28Z) - The Power of Contrast for Feature Learning: A Theoretical Analysis [42.20116348668721]
We show that contrastive learning outperforms the standard autoencoders and generative adversarial networks.
We also illustrate the impact of labeled data in supervised contrastive learning.
arXiv Detail & Related papers (2021-10-06T03:10:28Z) - Contrastive Learning Inverts the Data Generating Process [36.30995987986073]
We prove that feedforward models trained with objectives belonging to the commonly used InfoNCE family learn to implicitly invert the underlying generative model of the observed data.
Our theory highlights a fundamental connection between contrastive learning, generative modeling, and nonlinear independent component analysis.
arXiv Detail & Related papers (2021-02-17T16:21:54Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.