Temperature Schedules for Self-Supervised Contrastive Methods on
Long-Tail Data
- URL: http://arxiv.org/abs/2303.13664v1
- Date: Thu, 23 Mar 2023 20:37:25 GMT
- Title: Temperature Schedules for Self-Supervised Contrastive Methods on
Long-Tail Data
- Authors: Anna Kukleva, Moritz B\"ohle, Bernt Schiele, Hilde Kuehne, Christian
Rupprecht
- Abstract summary: In this paper, we analyse the behaviour of one of the most popular variants of self-supervised learning (SSL) on long-tail data.
We find that a large $tau$ emphasises group-wise discrimination, whereas a small $tau$ leads to a higher degree of instance discrimination.
We propose to employ a dynamic $tau$ and show that a simple cosine schedule can yield significant improvements in the learnt representations.
- Score: 87.77128754860983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most approaches for self-supervised learning (SSL) are optimised on curated
balanced datasets, e.g. ImageNet, despite the fact that natural data usually
exhibits long-tail distributions. In this paper, we analyse the behaviour of
one of the most popular variants of SSL, i.e. contrastive methods, on long-tail
data. In particular, we investigate the role of the temperature parameter
$\tau$ in the contrastive loss, by analysing the loss through the lens of
average distance maximisation, and find that a large $\tau$ emphasises
group-wise discrimination, whereas a small $\tau$ leads to a higher degree of
instance discrimination. While $\tau$ has thus far been treated exclusively as
a constant hyperparameter, in this work, we propose to employ a dynamic $\tau$
and show that a simple cosine schedule can yield significant improvements in
the learnt representations. Such a schedule results in a constant `task
switching' between an emphasis on instance discrimination and group-wise
discrimination and thereby ensures that the model learns both group-wise
features, as well as instance-specific details. Since frequent classes benefit
from the former, while infrequent classes require the latter, we find this
method to consistently improve separation between the classes in long-tail data
without any additional computational cost.
Related papers
- Data curation via joint example selection further accelerates multimodal learning [3.329535792151987]
We show that jointly selecting batches of data is more effective for learning than selecting examples independently.
We derive a simple and tractable algorithm for selecting such batches, which significantly accelerate training beyond individually-prioritized data points.
arXiv Detail & Related papers (2024-06-25T16:52:37Z) - $\nabla τ$: Gradient-based and Task-Agnostic machine Unlearning [7.04736023670375]
We introduce Gradient-based and Task-Agnostic machine Unlearning ($nabla tau$)
$nabla tau$ applies adaptive gradient ascent to the data to be forgotten while using standard gradient descent for the remaining data.
We evaluate our framework's effectiveness using a set of well-established Membership Inference Attack metrics.
arXiv Detail & Related papers (2024-03-21T12:11:26Z) - Non-contrastive representation learning for intervals from well logs [58.70164460091879]
The representation learning problem in the oil & gas industry aims to construct a model that provides a representation based on logging data for a well interval.
One of the possible approaches is self-supervised learning (SSL)
We are the first to introduce non-contrastive SSL for well-logging data.
arXiv Detail & Related papers (2022-09-28T13:27:10Z) - Constructing Balance from Imbalance for Long-tailed Image Recognition [50.6210415377178]
The imbalance between majority (head) classes and minority (tail) classes severely skews the data-driven deep neural networks.
Previous methods tackle with data imbalance from the viewpoints of data distribution, feature space, and model design.
We propose a concise paradigm by progressively adjusting label space and dividing the head classes and tail classes.
Our proposed model also provides a feature evaluation method and paves the way for long-tailed feature learning.
arXiv Detail & Related papers (2022-08-04T10:22:24Z) - AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail
Problems [102.95119281306893]
We present an early trial to explore adversarial training methods to optimize AUC.
We reformulate the AUC optimization problem as a saddle point problem, where the objective becomes an instance-wise function.
Our analysis differs from the existing studies since the algorithm is asked to generate adversarial examples by calculating the gradient of a min-max problem.
arXiv Detail & Related papers (2022-06-24T09:13:39Z) - Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes.
It causes severe biases of the head classes (with majority samples) against the tailed ones.
We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z) - Long-tailed Recognition by Routing Diverse Distribution-Aware Experts [64.71102030006422]
We propose a new long-tailed classifier called RoutIng Diverse Experts (RIDE)
It reduces the model variance with multiple experts, reduces the model bias with a distribution-aware diversity loss, reduces the computational cost with a dynamic expert routing module.
RIDE outperforms the state-of-the-art by 5% to 7% on CIFAR100-LT, ImageNet-LT and iNaturalist 2018 benchmarks.
arXiv Detail & Related papers (2020-10-05T06:53:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.