Not All Semantics are Created Equal: Contrastive Self-supervised
Learning with Automatic Temperature Individualization
- URL: http://arxiv.org/abs/2305.11965v1
- Date: Fri, 19 May 2023 19:25:56 GMT
- Title: Not All Semantics are Created Equal: Contrastive Self-supervised
Learning with Automatic Temperature Individualization
- Authors: Zi-Hao Qiu, Quanqi Hu, Zhuoning Yuan, Denny Zhou, Lijun Zhang, Tianbao
Yang
- Abstract summary: We propose a new robust contrastive loss inspired by distributionally robust optimization (DRO)
We show that our algorithm automatically learns a suitable $tau$ for each sample.
Our method outperforms prior strong baselines on unimodal and bimodal datasets.
- Score: 51.41175648612714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we aim to optimize a contrastive loss with individualized
temperatures in a principled and systematic manner for self-supervised
learning. The common practice of using a global temperature parameter $\tau$
ignores the fact that ``not all semantics are created equal", meaning that
different anchor data may have different numbers of samples with similar
semantics, especially when data exhibits long-tails. First, we propose a new
robust contrastive loss inspired by distributionally robust optimization (DRO),
providing us an intuition about the effect of $\tau$ and a mechanism for
automatic temperature individualization. Then, we propose an efficient
stochastic algorithm for optimizing the robust contrastive loss with a provable
convergence guarantee without using large mini-batch sizes. Theoretical and
experimental results show that our algorithm automatically learns a suitable
$\tau$ for each sample. Specifically, samples with frequent semantics use large
temperatures to keep local semantic structures, while samples with rare
semantics use small temperatures to induce more separable features. Our method
not only outperforms prior strong baselines (e.g., SimCLR, CLIP) on unimodal
and bimodal datasets with larger improvements on imbalanced data but also is
less sensitive to hyper-parameters. To our best knowledge, this is the first
methodical approach to optimizing a contrastive loss with individualized
temperatures.
Related papers
- CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective [48.99488315273868]
We present a contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints.
Our method minimizes logit differences within the same sample by considering their numerical values.
We conduct comprehensive experiments on three datasets including CIFAR-100, ImageNet-1K, and MS COCO.
arXiv Detail & Related papers (2024-04-22T11:52:40Z) - Fine-Tuning Adaptive Stochastic Optimizers: Determining the Optimal Hyperparameter $ε$ via Gradient Magnitude Histogram Analysis [0.7366405857677226]
We introduce a new framework based on the empirical probability density function of the loss's magnitude, termed the "gradient magnitude histogram"
We propose a novel algorithm using gradient magnitude histograms to automatically estimate a refined and accurate search space for the optimal safeguard.
arXiv Detail & Related papers (2023-11-20T04:34:19Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Dynamically Scaled Temperature in Self-Supervised Contrastive Learning [11.133502139934437]
We focus on improving the performance of InfoNCE loss in self-supervised learning by proposing a novel cosine similarity dependent temperature scaling function.
Experimental evidence shows that the proposed framework outperforms the contrastive loss-based SSL algorithms.
arXiv Detail & Related papers (2023-08-02T13:31:41Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Infinite Recommendation Networks: A Data-Centric Approach [8.044430277912936]
We leverage the Neural Tangent Kernel to train infinitely-wide neural networks to devise $infty$-AE: an autoencoder with infinitely-wide bottleneck layers.
We also develop Distill-CF for synthesizing tiny, high-fidelity data summaries.
We observe 96-105% of $infty$-AE's performance on the full dataset with as little as 0.1% of the original dataset size.
arXiv Detail & Related papers (2022-06-03T00:34:13Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.