The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning
- URL: http://arxiv.org/abs/2303.00106v1
- Date: Tue, 28 Feb 2023 22:14:33 GMT
- Title: The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning
- Authors: Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu
Liang, Somesh Jha
- Abstract summary: We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
- Score: 32.15608637930748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training representations (a.k.a. foundation models) has recently become a
prevalent learning paradigm, where one first pre-trains a representation using
large-scale unlabeled data, and then learns simple predictors on top of the
representation using small labeled data from the downstream tasks. There are
two key desiderata for the representation: label efficiency (the ability to
learn an accurate classifier on top of the representation with a small amount
of labeled data) and universality (usefulness across a wide range of downstream
tasks). In this paper, we focus on one of the most popular instantiations of
this paradigm: contrastive learning with linear probing, i.e., learning a
linear predictor on the representation pre-trained by contrastive learning. We
show that there exists a trade-off between the two desiderata so that one may
not be able to achieve both simultaneously. Specifically, we provide analysis
using a theoretical data model and show that, while more diverse pre-training
data result in more diverse features for different tasks (improving
universality), it puts less emphasis on task-specific features, giving rise to
larger sample complexity for down-stream supervised tasks, and thus worse
prediction performance. Guided by this analysis, we propose a contrastive
regularization method to improve the trade-off. We validate our analysis and
method empirically with systematic experiments using real-world datasets and
foundation models.
Related papers
- On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - Multi-Task Self-Supervised Time-Series Representation Learning [3.31490164885582]
Time-series representation learning can extract representations from data with temporal dynamics and sparse labels.
We propose a new time-series representation learning method by combining the advantages of self-supervised tasks.
We evaluate the proposed framework on three downstream tasks: time-series classification, forecasting, and anomaly detection.
arXiv Detail & Related papers (2023-03-02T07:44:06Z) - MixSiam: A Mixture-based Approach to Self-supervised Representation
Learning [33.52892899982186]
Recently contrastive learning has shown significant progress in learning visual representations from unlabeled data.
We propose MixSiam, a mixture-based approach upon the traditional siamese network.
arXiv Detail & Related papers (2021-11-04T08:12:47Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Conditional Mutual information-based Contrastive Loss for Financial Time
Series Forecasting [12.0855096102517]
We present a representation learning framework for financial time series forecasting.
In this paper, we propose to first learn compact representations from time series data, then use the learned representations to train a simpler model for predicting time series movements.
arXiv Detail & Related papers (2020-02-18T15:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.