Related papers: Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective

Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective

URL: http://arxiv.org/abs/2307.16889v2
Date: Tue, 22 Aug 2023 06:15:15 GMT
Title: Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective
Authors: Renyu Zhu, Haoyu Liu, Runze Wu, Minmin Lin, Tangjie Lv, Changjie Fan, Haobo Wang
Abstract summary: We propose a novel sample selection-based approach for noisy label learning, called Proto-semi. Proto-semi divides all samples into the confident and unconfident datasets via warm-up. By leveraging the confident dataset, prototype vectors are constructed to capture class characteristics. Empirical evaluations on a real-world annotated dataset substantiate the robustness of Proto-semi in handling the problem of learning from noisy labels.
Score: 38.24239397999152
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we investigate the problem of learning with noisy labels in real-world annotation scenarios, where noise can be categorized into two types: factual noise and ambiguity noise. To better distinguish these noise types and utilize their semantics, we propose a novel sample selection-based approach for noisy label learning, called Proto-semi. Proto-semi initially divides all samples into the confident and unconfident datasets via warm-up. By leveraging the confident dataset, prototype vectors are constructed to capture class characteristics. Subsequently, the distances between the unconfident samples and the prototype vectors are calculated to facilitate noise classification. Based on these distances, the labels are either corrected or retained, resulting in the refinement of the confident and unconfident datasets. Finally, we introduce a semi-supervised learning method to enhance training. Empirical evaluations on a real-world annotated dataset substantiate the robustness of Proto-semi in handling the problem of learning from noisy labels. Meanwhile, the prototype-based repartitioning strategy is shown to be effective in mitigating the adverse impact of label noise. Our code and data are available at https://github.com/fuxiAIlab/ProtoSemi.

Related papers

Correcting Noisy Multilabel Predictions: Modeling Label Noise through Latent Space Shifts [4.795811957412855]
Noise in data appears to be inevitable in most real-world machine learning applications. We investigate the less explored area of noisy label learning for multilabel classifications. Our model posits that label noise arises from a shift in the latent variable, providing a more robust and beneficial means for noisy learning.
arXiv Detail & Related papers (2025-02-20T05:41:52Z)
Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection [84.78475642696137]
The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. We propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS) SGPS constructs reliable positive pairs for noisy samples to enhance the sample utilization.
arXiv Detail & Related papers (2025-01-19T14:41:55Z)
NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification [7.464154519547575]
Existing research on learning with noisy labels predominantly focuses on synthetic noise patterns. We constructed a benchmark dataset to better understand label noise in real-world text classification settings. Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise.
arXiv Detail & Related papers (2024-07-09T06:18:40Z)
Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching. By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously. Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z)
Neighborhood Collective Estimation for Noisy Label Identification and Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels. Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias. We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z)
Towards Harnessing Feature Embedding for Robust Learning with Noisy Labels [44.133307197696446]
The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods. We propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND)
arXiv Detail & Related papers (2022-06-27T02:45:09Z)
Context-based Virtual Adversarial Training for Text Classification with Noisy Labels [1.9508698179748525]
We propose context-based virtual adversarial training (ConVAT) to prevent a text classifier from overfitting to noisy labels. Unlike the previous works, the proposed method performs the adversarial training at the context level rather than the inputs. We conduct extensive experiments on four text classification datasets with two types of label noises.
arXiv Detail & Related papers (2022-05-29T14:19:49Z)
S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise. In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space. Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z)
Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise. This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N) We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z)
Learning to Aggregate and Refine Noisy Labels for Visual Sentiment Analysis [69.48582264712854]
We propose a robust learning method to perform robust visual sentiment analysis. Our method relies on an external memory to aggregate and filter noisy labels during training. We establish a benchmark for visual sentiment analysis with label noise using publicly available datasets.
arXiv Detail & Related papers (2021-09-15T18:18:28Z)
Towards Robustness to Label Noise in Text Classification via Noise Modeling [7.863638253070439]
Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier.
arXiv Detail & Related papers (2021-01-27T05:41:57Z)
A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks. We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z)
EvidentialMix: Learning with Combined Open-set and Closed-set Noisy Labels [30.268962418683955]
We study a new variant of the noisy label problem that combines the open-set and closed-set noisy labels. Our results show that our method produces superior classification results and better feature representations than previous state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:15:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.