Towards the Generalization of Contrastive Self-Supervised Learning
- URL: http://arxiv.org/abs/2111.00743v1
- Date: Mon, 1 Nov 2021 07:39:38 GMT
- Title: Towards the Generalization of Contrastive Self-Supervised Learning
- Authors: Weiran Huang and Mingyang Yi and Xuyang Zhao
- Abstract summary: We present a theoretical explanation of how contrastive self-supervised pre-trained models generalize to downstream tasks.
We further explore SimCLR and Barlow Twins, which are two canonical contrastive self-supervised methods.
- Score: 11.889992921445849
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, self-supervised learning has attracted great attention since it
only requires unlabeled data for training. Contrastive learning is a popular
approach for self-supervised learning and empirically performs well in
practice. However, the theoretical understanding of its generalization ability
on downstream tasks is not well studied. To this end, we present a theoretical
explanation of how contrastive self-supervised pre-trained models generalize to
downstream tasks. Concretely, we quantitatively show that the self-supervised
model has generalization ability on downstream classification tasks if it
embeds input data into a feature space with distinguishing centers of classes
and closely clustered intra-class samples. With the above conclusion, we
further explore SimCLR and Barlow Twins, which are two canonical contrastive
self-supervised methods. We prove that the aforementioned feature space can be
obtained via any of the methods, and thus explain their success on the
generalization on downstream classification tasks. Finally, various experiments
are also conducted to verify our theoretical findings.
Related papers
- Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early.
We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization.
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z) - Memory Consistency Guided Divide-and-Conquer Learning for Generalized
Category Discovery [56.172872410834664]
Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning.
We propose a Memory Consistency guided Divide-and-conquer Learning framework (MCDL)
Our method outperforms state-of-the-art models by a large margin on both seen and unseen classes of the generic image recognition.
arXiv Detail & Related papers (2024-01-24T09:39:45Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - Towards Distribution-Agnostic Generalized Category Discovery [51.52673017664908]
Data imbalance and open-ended distribution are intrinsic characteristics of the real visual world.
We propose a Self-Balanced Co-Advice contrastive framework (BaCon)
BaCon consists of a contrastive-learning branch and a pseudo-labeling branch, working collaboratively to provide interactive supervision to resolve the DA-GCD task.
arXiv Detail & Related papers (2023-10-02T17:39:58Z) - Semi-supervised learning made simple with self-supervised clustering [65.98152950607707]
Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations.
We propose a conceptually simple yet empirically powerful approach to turn clustering-based self-supervised methods into semi-supervised learners.
arXiv Detail & Related papers (2023-06-13T01:09:18Z) - A Theoretical Study of Inductive Biases in Contrastive Learning [32.98250585760665]
We provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class.
We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture.
arXiv Detail & Related papers (2022-11-27T01:53:29Z) - Simple Control Baselines for Evaluating Transfer Learning [1.0499611180329802]
We share an evaluation standard that aims to quantify and communicate transfer learning performance.
We provide an example empirical study investigating a few basic questions about self-supervised learning.
arXiv Detail & Related papers (2022-02-07T17:26:26Z) - Disambiguation of weak supervision with exponential convergence rates [88.99819200562784]
In supervised learning, data are annotated with incomplete yet discriminative information.
In this paper, we focus on partial labelling, an instance of weak supervision where, from a given input, we are given a set of potential targets.
We propose an empirical disambiguation algorithm to recover full supervision from weak supervision.
arXiv Detail & Related papers (2021-02-04T18:14:32Z) - Bias-Awareness for Zero-Shot Learning the Seen and Unseen [47.09887661463657]
Generalized zero-shot learning recognizes inputs from both seen and unseen classes.
We propose a bias-aware learner to map inputs to a semantic embedding space for generalized zero-shot learning.
arXiv Detail & Related papers (2020-08-25T17:38:40Z) - Functional Regularization for Representation Learning: A Unified
Theoretical Perspective [27.93916012334704]
Unsupervised and self-supervised learning approaches have become a crucial tool to learn representations for downstream prediction tasks.
We present a unifying perspective where several such approaches can be viewed as imposing a regularization on the representation via a learnable function using unlabeled data.
We propose a discriminative theoretical framework for analyzing the sample complexity of these approaches, which generalizes the framework of (Balcan and Blum, 2010) to allow learnable regularization functions.
arXiv Detail & Related papers (2020-08-06T04:06:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.