Analysis of label noise in graph-based semi-supervised learning
- URL: http://arxiv.org/abs/2009.12966v1
- Date: Sun, 27 Sep 2020 22:13:20 GMT
- Title: Analysis of label noise in graph-based semi-supervised learning
- Authors: Bruno Klaus de Aquino Afonso, Lilian Berton
- Abstract summary: In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data.
It is often the case that most of our data is unlabeled.
Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution.
- Score: 2.4366811507669124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In machine learning, one must acquire labels to help supervise a model that
will be able to generalize to unseen data. However, the labeling process can be
tedious, long, costly, and error-prone. It is often the case that most of our
data is unlabeled. Semi-supervised learning (SSL) alleviates that by making
strong assumptions about the relation between the labels and the input data
distribution. This paradigm has been successful in practice, but most SSL
algorithms end up fully trusting the few available labels. In real life, both
humans and automated systems are prone to mistakes; it is essential that our
algorithms are able to work with labels that are both few and also unreliable.
Our work aims to perform an extensive empirical evaluation of existing
graph-based semi-supervised algorithms, like Gaussian Fields and Harmonic
Functions, Local and Global Consistency, Laplacian Eigenmaps, Graph
Transduction Through Alternating Minimization. To do that, we compare the
accuracy of classifiers while varying the amount of labeled data and label
noise for many different samples. Our results show that, if the dataset is
consistent with SSL assumptions, we are able to detect the noisiest instances,
although this gets harder when the number of available labels decreases. Also,
the Laplacian Eigenmaps algorithm performed better than label propagation when
the data came from high-dimensional clusters.
Related papers
- Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - Learning with Confidence: Training Better Classifiers from Soft Labels [0.0]
In supervised machine learning, models are typically trained using data with hard labels, i.e., definite assignments of class membership.
We investigate whether incorporating label uncertainty, represented as discrete probability distributions over the class labels, improves the predictive performance of classification models.
arXiv Detail & Related papers (2024-09-24T13:12:29Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Learned Label Aggregation for Weak Supervision [8.819582879892762]
We propose a data programming approach that aggregates weak supervision signals to generate labeled data easily.
The quality of the generated labels depends on a label aggregation model that aggregates all noisy labels from all LFs to infer the ground-truth labels.
We show the model can be trained using synthetically generated data and design an effective architecture for the model.
arXiv Detail & Related papers (2022-07-27T14:36:35Z) - How many labelers do you have? A closer look at gold-standard labels [10.637125300701795]
We show how access to non-aggregated label information can make training well-calibrated models more feasible than it is with gold-standard labels.
We make several predictions for real-world datasets, including when non-aggregate labels should improve learning performance.
arXiv Detail & Related papers (2022-06-24T02:33:50Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Instance-dependent Label-noise Learning under a Structural Causal Model [92.76400590283448]
Label noise will degenerate the performance of deep learning algorithms.
By leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning.
arXiv Detail & Related papers (2021-09-07T10:42:54Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Identifying noisy labels with a transductive semi-supervised
leave-one-out filter [2.4366811507669124]
We introduce the LGC_LVOF, a leave-one-out filtering approach based on the Local and Global Consistency (LGC) algorithm.
Our approach is best suited to datasets with a large amount of unlabeled data but not many labels.
arXiv Detail & Related papers (2020-09-24T16:50:06Z) - Label Noise Types and Their Effects on Deep Learning [0.0]
In this work, we provide a detailed analysis of the effects of different kinds of label noise on learning.
We propose a generic framework to generate feature-dependent label noise, which we show to be the most challenging case for learning.
For the ease of other researchers to test their algorithms with noisy labels, we share corrupted labels for the most commonly used benchmark datasets.
arXiv Detail & Related papers (2020-03-23T18:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.