Related papers: Analysis of label noise in graph-based semi-supervised learning

Analysis of label noise in graph-based semi-supervised learning

URL: http://arxiv.org/abs/2009.12966v1
Date: Sun, 27 Sep 2020 22:13:20 GMT
Title: Analysis of label noise in graph-based semi-supervised learning
Authors: Bruno Klaus de Aquino Afonso, Lilian Berton
Abstract summary: In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data. It is often the case that most of our data is unlabeled. Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution.
Score: 2.4366811507669124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data. However, the labeling process can be tedious, long, costly, and error-prone. It is often the case that most of our data is unlabeled. Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution. This paradigm has been successful in practice, but most SSL algorithms end up fully trusting the few available labels. In real life, both humans and automated systems are prone to mistakes; it is essential that our algorithms are able to work with labels that are both few and also unreliable. Our work aims to perform an extensive empirical evaluation of existing graph-based semi-supervised algorithms, like Gaussian Fields and Harmonic Functions, Local and Global Consistency, Laplacian Eigenmaps, Graph Transduction Through Alternating Minimization. To do that, we compare the accuracy of classifiers while varying the amount of labeled data and label noise for many different samples. Our results show that, if the dataset is consistent with SSL assumptions, we are able to detect the noisiest instances, although this gets harder when the number of available labels decreases. Also, the Laplacian Eigenmaps algorithm performed better than label propagation when the data came from high-dimensional clusters.

Related papers

Learning from Ambiguous Data with Hard Labels [34.06499138206804]
Real-world data often contains intrinsic ambiguity that the common single-hard-label annotation paradigm ignores. Standard training using ambiguous data with hard labels may produce overly confident models and thus leading to poor generalization. We propose a novel framework called Quantized Label Learning (QLL) to alleviate this issue.
arXiv Detail & Related papers (2025-01-03T14:54:49Z)
Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data. This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning. We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z)
Learning with Confidence: Training Better Classifiers from Soft Labels [0.0]
In supervised machine learning, models are typically trained using data with hard labels, i.e., definite assignments of class membership. We investigate whether incorporating label uncertainty, represented as discrete probability distributions over the class labels, improves the predictive performance of classification models.
arXiv Detail & Related papers (2024-09-24T13:12:29Z)
FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data. Most SSL methods are commonly based on instance-wise consistency between different data transformations. We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z)
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z)
Learned Label Aggregation for Weak Supervision [8.819582879892762]
We propose a data programming approach that aggregates weak supervision signals to generate labeled data easily. The quality of the generated labels depends on a label aggregation model that aggregates all noisy labels from all LFs to infer the ground-truth labels. We show the model can be trained using synthetically generated data and design an effective architecture for the model.
arXiv Detail & Related papers (2022-07-27T14:36:35Z)
How many labelers do you have? A closer look at gold-standard labels [10.637125300701795]
We show how access to non-aggregated label information can make training well-calibrated models more feasible than it is with gold-standard labels. We make several predictions for real-world datasets, including when non-aggregate labels should improve learning performance.
arXiv Detail & Related papers (2022-06-24T02:33:50Z)
Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples. A typical alternative is learning from multiple noisy annotators. This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z)
Instance-dependent Label-noise Learning under a Structural Causal Model [92.76400590283448]
Label noise will degenerate the performance of deep learning algorithms. By leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning.
arXiv Detail & Related papers (2021-09-07T10:42:54Z)
Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
Identifying noisy labels with a transductive semi-supervised leave-one-out filter [2.4366811507669124]
We introduce the LGC_LVOF, a leave-one-out filtering approach based on the Local and Global Consistency (LGC) algorithm. Our approach is best suited to datasets with a large amount of unlabeled data but not many labels.
arXiv Detail & Related papers (2020-09-24T16:50:06Z)
Label Noise Types and Their Effects on Deep Learning [0.0]
In this work, we provide a detailed analysis of the effects of different kinds of label noise on learning. We propose a generic framework to generate feature-dependent label noise, which we show to be the most challenging case for learning. For the ease of other researchers to test their algorithms with noisy labels, we share corrupted labels for the most commonly used benchmark datasets.
arXiv Detail & Related papers (2020-03-23T18:03:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.