Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy
Labels
- URL: http://arxiv.org/abs/2103.13646v1
- Date: Thu, 25 Mar 2021 07:40:51 GMT
- Title: Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy
Labels
- Authors: Evgenii Zheltonozhskii, Chaim Baskin, Avi Mendelson, Alex M.
Bronstein, Or Litany
- Abstract summary: "Contrast to Divide" (C2D) is a framework that pre-trains the feature extractor in a self-supervised fashion.
Using self-supervised pre-training boosts the performance of existing LNL approaches by drastically reducing the warm-up stage's susceptibility to noise level.
In real-life noise settings, C2D trained on mini-WebVision outperforms previous works both in WebVision and ImageNet validation sets by 3% top-1 accuracy.
- Score: 12.181548895121685
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The success of learning with noisy labels (LNL) methods relies heavily on the
success of a warm-up stage where standard supervised training is performed
using the full (noisy) training set. In this paper, we identify a "warm-up
obstacle": the inability of standard warm-up stages to train high quality
feature extractors and avert memorization of noisy labels. We propose "Contrast
to Divide" (C2D), a simple framework that solves this problem by pre-training
the feature extractor in a self-supervised fashion. Using self-supervised
pre-training boosts the performance of existing LNL approaches by drastically
reducing the warm-up stage's susceptibility to noise level, shortening its
duration, and increasing extracted feature quality. C2D works out of the box
with existing methods and demonstrates markedly improved performance,
especially in the high noise regime, where we get a boost of more than 27% for
CIFAR-100 with 90% noise over the previous state of the art. In real-life noise
settings, C2D trained on mini-WebVision outperforms previous works both in
WebVision and ImageNet validation sets by 3% top-1 accuracy. We perform an
in-depth analysis of the framework, including investigating the performance of
different pre-training approaches and estimating the effective upper bound of
the LNL performance with semi-supervised learning. Code for reproducing our
experiments is available at https://github.com/ContrastToDivide/C2D
Related papers
- Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining [21.26555178371168]
Target-Speaker Voice Activity Detection (TS-VAD) is the task of detecting the presence of speech from a known target-speaker in an audio frame.
Deep neural network-based models have shown good performance in this task.
We propose a causal, Self-Supervised Learning (SSL) pretraining framework to enhance TS-VAD performance in noisy conditions.
arXiv Detail & Related papers (2025-01-06T18:00:14Z) - Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Refining Pre-Trained Motion Models [56.18044168821188]
We take on the challenge of improving state-of-the-art supervised models with self-supervised training.
We focus on obtaining a "clean" training signal from real-world unlabelled video.
We show that our method yields reliable gains over fully-supervised methods in real videos.
arXiv Detail & Related papers (2024-01-01T18:59:33Z) - Audio-Visual Efficient Conformer for Robust Speech Recognition [91.3755431537592]
We propose to improve the noise of the recently proposed Efficient Conformer Connectionist Temporal Classification architecture by processing both audio and visual modalities.
Our experiments show that using audio and visual modalities allows to better recognize speech in the presence of environmental noise and significantly accelerate training, reaching lower WER with 4 times less training steps.
arXiv Detail & Related papers (2023-01-04T05:36:56Z) - CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention [31.84299688413136]
Contrastive Language-Image Pre-training has been shown to learn visual representations with great transferability.
Existing works propose additional learnable modules upon CLIP and fine-tune them by few-shot training sets.
We introduce a free-lunch enhancement method, CALIP, to boost CLIP's zero-shot performance via a parameter-free Attention module.
arXiv Detail & Related papers (2022-09-28T15:22:11Z) - Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL)
Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z) - CNLL: A Semi-supervised Approach For Continual Noisy Label Learning [12.341250124228859]
We propose a simple purification technique to effectively cleanse the online data stream that is both cost-effective and more accurate.
After purification, we perform fine-tuning in a semi-supervised fashion that ensures the participation of all available samples.
We achieve a 24.8% performance gain for CIFAR10 with 20% noise over previous SOTA methods.
arXiv Detail & Related papers (2022-04-21T05:01:10Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.