Almost exact recovery in noisy semi-supervised learning
- URL: http://arxiv.org/abs/2007.14717v4
- Date: Wed, 5 Jun 2024 12:03:47 GMT
- Title: Almost exact recovery in noisy semi-supervised learning
- Authors: Konstantin Avrachenkov, Maximilien Dreveton,
- Abstract summary: Graph-based semi-supervised learning methods combine the graph structure and labeled data to classify unlabeled data.
We propose an algorithm derived from a continuous relaxation of the MAP, and we establish its consistency.
Numerical experiments show that our approach achieves promising performance on synthetic and real data sets, even in the case of very noisy labeled data.
- Score: 0.09208007322096533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph-based semi-supervised learning methods combine the graph structure and labeled data to classify unlabeled data. In this work, we study the effect of a noisy oracle on classification. In particular, we derive the Maximum A Posteriori (MAP) estimator for clustering a Degree Corrected Stochastic Block Model (DC-SBM) when a noisy oracle reveals a fraction of the labels. We then propose an algorithm derived from a continuous relaxation of the MAP, and we establish its consistency. Numerical experiments show that our approach achieves promising performance on synthetic and real data sets, even in the case of very noisy labeled data.
Related papers
- Graph-based Semi-Supervised Learning via Maximum Discrimination [0.8594140167290097]
Semi-supervised learning (SSL) addresses the challenge of training accurate models when labeled data is scarce but unlabeled data is abundant.<n>We develop AUC-spec, a graph approach that computes a low-dimensional representation that maximizes class separation.<n>It demonstrates competitive results on synthetic and real-world datasets while maintaining computational efficiency comparable to the field's classic and state-of-the-art methods.
arXiv Detail & Related papers (2026-02-08T16:18:49Z) - Graph-Based Semi-Supervised Segregated Lipschitz Learning [0.21847754147782888]
This paper presents an approach to semi-supervised learning for the classification of data using the Lipschitz Learning on graphs.
We develop a graph-based semi-supervised learning framework that leverages the properties of the infinity Laplacian to propagate labels in a dataset where only a few samples are labeled.
arXiv Detail & Related papers (2024-11-05T17:16:56Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Group Benefits Instances Selection for Data Purification [21.977432359384835]
Existing methods for combating label noise are typically designed and tested on synthetic datasets.
We propose a method named GRIP to alleviate the noisy label problem for both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-23T03:06:19Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Is margin all you need? An extensive empirical study of active learning
on tabular data [66.18464006872345]
We analyze the performance of a variety of active learning algorithms on 69 real-world datasets from the OpenML-CC18 benchmark.
Surprisingly, we find that the classical margin sampling technique matches or outperforms all others, including current state-of-art.
arXiv Detail & Related papers (2022-10-07T21:18:24Z) - Towards Harnessing Feature Embedding for Robust Learning with Noisy
Labels [44.133307197696446]
The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods.
We propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND)
arXiv Detail & Related papers (2022-06-27T02:45:09Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Instance-dependent Label-noise Learning under a Structural Causal Model [92.76400590283448]
Label noise will degenerate the performance of deep learning algorithms.
By leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning.
arXiv Detail & Related papers (2021-09-07T10:42:54Z) - Analysis of label noise in graph-based semi-supervised learning [2.4366811507669124]
In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data.
It is often the case that most of our data is unlabeled.
Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution.
arXiv Detail & Related papers (2020-09-27T22:13:20Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.