Identifying noisy labels with a transductive semi-supervised
leave-one-out filter
- URL: http://arxiv.org/abs/2009.11811v1
- Date: Thu, 24 Sep 2020 16:50:06 GMT
- Title: Identifying noisy labels with a transductive semi-supervised
leave-one-out filter
- Authors: Bruno Klaus de Aquino Afonso, Lilian Berton
- Abstract summary: We introduce the LGC_LVOF, a leave-one-out filtering approach based on the Local and Global Consistency (LGC) algorithm.
Our approach is best suited to datasets with a large amount of unlabeled data but not many labels.
- Score: 2.4366811507669124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Obtaining data with meaningful labels is often costly and error-prone. In
this situation, semi-supervised learning (SSL) approaches are interesting, as
they leverage assumptions about the unlabeled data to make up for the limited
amount of labels. However, in real-world situations, we cannot assume that the
labeling process is infallible, and the accuracy of many SSL classifiers
decreases significantly in the presence of label noise. In this work, we
introduce the LGC_LVOF, a leave-one-out filtering approach based on the Local
and Global Consistency (LGC) algorithm. Our method aims to detect and remove
wrong labels, and thus can be used as a preprocessing step to any SSL
classifier. Given the propagation matrix, detecting noisy labels takes O(cl)
per step, with c the number of classes and l the number of labels. Moreover,
one does not need to compute the whole propagation matrix, but only an $l$ by
$l$ submatrix corresponding to interactions between labeled instances. As a
result, our approach is best suited to datasets with a large amount of
unlabeled data but not many labels. Results are provided for a number of
datasets, including MNIST and ISOLET. LGCLVOF appears to be equally or more
precise than the adapted gradient-based filter. We show that the best-case
accuracy of the embedding of LGCLVOF into LGC yields performance comparable to
the best-case of $\ell_1$-based classifiers designed to be robust to label
noise. We provide a heuristic to choose the number of removed instances.
Related papers
- Inaccurate Label Distribution Learning with Dependency Noise [52.08553913094809]
We introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning.
We show that DN-ILDL effectively addresses the ILDL problem and outperforms existing LDL methods.
arXiv Detail & Related papers (2024-05-26T07:58:07Z) - Prompt-based Pseudo-labeling Strategy for Sample-Efficient Semi-Supervised Extractive Summarization [12.582774521907227]
Semi-supervised learning (SSL) is a widely used technique in scenarios where labeled data is scarce and unlabeled data is abundant.
Standard SSL methods follow a teacher-student paradigm to first train a classification model and then use the classifier's confidence values to select pseudo-labels.
We propose a prompt-based pseudo-labeling strategy with LLMs that picks unlabeled examples with more accurate pseudo-labels.
arXiv Detail & Related papers (2023-11-16T04:29:41Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Positive Label Is All You Need for Multi-Label Classification [3.354528906571718]
Multi-label classification (MLC) faces challenges from label noise in training data.
Our paper addresses label noise in MLC by introducing a positive and unlabeled multi-label classification (PU-MLC) method.
PU-MLC employs positive-unlabeled learning, training the model with only positive labels and unlabeled data.
arXiv Detail & Related papers (2023-06-28T08:44:00Z) - Complementary to Multiple Labels: A Correlation-Aware Correction
Approach [65.59584909436259]
We show theoretically how the estimated transition matrix in multi-class CLL could be distorted in multi-labeled cases.
We propose a two-step method to estimate the transition matrix from candidate labels.
arXiv Detail & Related papers (2023-02-25T04:48:48Z) - Pseudo-Labeling Based Practical Semi-Supervised Meta-Training for Few-Shot Learning [93.63638405586354]
We propose a simple and effective meta-training framework, called pseudo-labeling based meta-learning (PLML)
Firstly, we train a classifier via common semi-supervised learning (SSL) and use it to obtain the pseudo-labels of unlabeled data.
We build few-shot tasks from labeled and pseudo-labeled data and design a novel finetuning method with feature smoothing and noise suppression.
arXiv Detail & Related papers (2022-07-14T10:53:53Z) - Multi-Label Gold Asymmetric Loss Correction with Single-Label Regulators [6.129273021888717]
We propose a novel Gold Asymmetric Loss Correction with Single-Label Regulators (GALC-SLR) that operates robust against noisy labels.
GALC-SLR estimates the noise confusion matrix using single-label samples, then constructs an asymmetric loss correction via estimated confusion matrix to avoid overfitting to the noisy labels.
Empirical results show that our method outperforms the state-of-the-art original asymmetric loss multi-label classifier under all corruption levels.
arXiv Detail & Related papers (2021-08-04T12:57:29Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Extended T: Learning with Mixed Closed-set and Open-set Noisy Labels [86.5943044285146]
The label noise transition matrix $T$ reflects the probabilities that true labels flip into noisy ones.
In this paper, we focus on learning under the mixed closed-set and open-set label noise.
Our method can better model the mixed label noise, following its more robust performance than the prior state-of-the-art label-noise learning methods.
arXiv Detail & Related papers (2020-12-02T02:42:45Z) - Analysis of label noise in graph-based semi-supervised learning [2.4366811507669124]
In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data.
It is often the case that most of our data is unlabeled.
Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution.
arXiv Detail & Related papers (2020-09-27T22:13:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.