MixMOOD: A systematic approach to class distribution mismatch in
semi-supervised learning using deep dataset dissimilarity measures
- URL: http://arxiv.org/abs/2006.07767v1
- Date: Sun, 14 Jun 2020 01:52:29 GMT
- Title: MixMOOD: A systematic approach to class distribution mismatch in
semi-supervised learning using deep dataset dissimilarity measures
- Authors: Saul Calderon-Ramirez, Luis Oala, Jordina Torrents-Barrena, Shengxiang
Yang, Armaghan Moemeni, Wojciech Samek, Miguel A. Molina-Cabello
- Abstract summary: MixMOOD is a systematic approach to mitigate effect of class distribution mismatch in semi-supervised deep learning (SSDL) with MixMatch.
In the first part, we analyze the sensitivity of MixMatch accuracy under 90 different distribution mismatch scenarios.
In the second part, we propose an efficient and effective method, called deep dataset dissimilarity measures (DeDiMs) to compare labelled and unlabelled datasets.
- Score: 13.823764245165792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose MixMOOD - a systematic approach to mitigate effect
of class distribution mismatch in semi-supervised deep learning (SSDL) with
MixMatch. This work is divided into two components: (i) an extensive out of
distribution (OOD) ablation test bed for SSDL and (ii) a quantitative
unlabelled dataset selection heuristic referred to as MixMOOD. In the first
part, we analyze the sensitivity of MixMatch accuracy under 90 different
distribution mismatch scenarios across three multi-class classification tasks.
These are designed to systematically understand how OOD unlabelled data affects
MixMatch performance. In the second part, we propose an efficient and effective
method, called deep dataset dissimilarity measures (DeDiMs), to compare
labelled and unlabelled datasets. The proposed DeDiMs are quick to evaluate and
model agnostic. They use the feature space of a generic Wide-ResNet and can be
applied prior to learning. Our test results reveal that supposed semantic
similarity between labelled and unlabelled data is not a good heuristic for
unlabelled data selection. In contrast, strong correlation between MixMatch
accuracy and the proposed DeDiMs allow us to quantitatively rank different
unlabelled datasets ante hoc according to expected MixMatch accuracy. This is
what we call MixMOOD. Furthermore, we argue that the MixMOOD approach can aid
to standardize the evaluation of different semi-supervised learning techniques
under real world scenarios involving out of distribution data.
Related papers
- Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning [81.83013974171364]
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations.
Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance.
We propose a dual-perspective method to generate high-quality pseudo-labels.
arXiv Detail & Related papers (2024-07-26T09:33:53Z) - SUMix: Mixup with Semantic and Uncertain Information [41.99721365685618]
Mixup data augmentation approaches have been applied for various tasks of deep learning.
We propose a novel approach named SUMix to learn the mixing ratio as well as the uncertainty for the mixed samples during the training process.
arXiv Detail & Related papers (2024-07-10T16:25:26Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z) - Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for
Severe Label Noise [4.90148689564172]
Real-world datasets contain noisy label samples that have no semantic relevance to any class in the dataset.
Most state-of-the-art methods leverage ID labeled noisy samples as unlabeled data for semi-supervised learning.
We propose incorporating the information from all the training data by leveraging the benefits of self-supervised training.
arXiv Detail & Related papers (2023-08-13T23:33:33Z) - Exploring the Boundaries of Semi-Supervised Facial Expression Recognition using In-Distribution, Out-of-Distribution, and Unconstrained Data [23.4909421082857]
We present a study on 11 of the most recent semi-supervised methods, in the context of facial expression recognition (FER)
Our investigation covers semi-supervised learning from in-distribution, out-of-distribution, unconstrained, and very small unlabelled data.
With an equal number of labelled samples, semi-supervised learning delivers a considerable improvement over supervised learning.
arXiv Detail & Related papers (2023-06-02T01:40:08Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - DivideMix: Learning with Noisy Labels as Semi-supervised Learning [111.03364864022261]
We propose DivideMix, a framework for learning with noisy labels.
Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods.
arXiv Detail & Related papers (2020-02-18T06:20:06Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.