Learning from Crowds by Modeling Common Confusions
- URL: http://arxiv.org/abs/2012.13052v1
- Date: Thu, 24 Dec 2020 01:13:23 GMT
- Title: Learning from Crowds by Modeling Common Confusions
- Authors: Zhendong Chu, Jing Ma, Hongning Wang
- Abstract summary: Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost.
However, the annotation quality of annotators varies considerably.
We provide a new perspective to decompose annotation noise into common noise and individual noise.
- Score: 33.92690297826468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowdsourcing provides a practical way to obtain large amounts of labeled
data at a low cost. However, the annotation quality of annotators varies
considerably, which imposes new challenges in learning a high-quality model
from the crowdsourced annotations. In this work, we provide a new perspective
to decompose annotation noise into common noise and individual noise and
differentiate the source of confusion based on instance difficulty and
annotator expertise on a per-instance-annotator basis. We realize this new
crowdsourcing model by an end-to-end learning solution with two types of noise
adaptation layers: one is shared across annotators to capture their commonly
shared confusions, and the other one is pertaining to each annotator to realize
individual confusion. To recognize the source of noise in each annotation, we
use an auxiliary network to choose the two noise adaptation layers with respect
to both instances and annotators. Extensive experiments on both synthesized and
real-world benchmarks demonstrate the effectiveness of our proposed common
noise adaptation solution.
Related papers
- Improving a Named Entity Recognizer Trained on Noisy Data with a Few
Clean Instances [55.37242480995541]
We propose to denoise noisy NER data with guidance from a small set of clean instances.
Along with the main NER model we train a discriminator model and use its outputs to recalibrate the sample weights.
Results on public crowdsourcing and distant supervision datasets show that the proposed method can consistently improve performance with a small guidance set.
arXiv Detail & Related papers (2023-10-25T17:23:37Z) - Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition [70.00984078351927]
This paper focuses on reducing noise based on some inherent properties of multi-label classification and long-tailed learning under noisy cases.
We propose a Stitch-Up augmentation to synthesize a cleaner sample, which directly reduces multi-label noise.
A Heterogeneous Co-Learning framework is further designed to leverage the inconsistency between long-tailed and balanced distributions.
arXiv Detail & Related papers (2023-07-03T09:20:28Z) - Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds [88.06545572893455]
In real-world crowd-sourcing scenarios, noise transition matrices are both annotator- and instance-dependent.
We first model the mixture of noise patterns by all annotators, and then transfer this modeling to individual annotators.
Experiments confirm the superiority of the proposed approach on synthetic and real-world crowd-sourcing data.
arXiv Detail & Related papers (2023-06-05T13:43:29Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Centrality and Consistency: Two-Stage Clean Samples Identification for
Learning with Instance-Dependent Noisy Labels [87.48541631675889]
We propose a two-stage clean samples identification method.
First, we employ a class-level feature clustering procedure for the early identification of clean samples.
Second, for the remaining clean samples that are close to the ground truth class boundary, we propose a novel consistency-based classification method.
arXiv Detail & Related papers (2022-07-29T04:54:57Z) - Noise-Tolerant Learning for Audio-Visual Action Recognition [31.641972732424463]
Video datasets are usually coarse-annotated or collected from the Internet.
We propose a noise-tolerant learning framework to find anti-interference model parameters against both noisy labels and noisy correspondence.
Our method significantly improves the robustness of the action recognition model and surpasses the baselines by a clear margin.
arXiv Detail & Related papers (2022-05-16T12:14:03Z) - Disjoint Contrastive Regression Learning for Multi-Sourced Annotations [10.159313152511919]
Large-scale datasets are important for the development of deep learning models.
Multiple annotators may be employed to label different subsets of the data.
The inconsistency and bias among different annotators are harmful to the model training.
arXiv Detail & Related papers (2021-12-31T12:39:04Z) - CrowdTeacher: Robust Co-teaching with Noisy Answers & Sample-specific
Perturbations for Tabular Data [8.276156981100364]
Co-teaching methods have shown promising improvements for computer vision problems with noisy labels.
Our model, CrowdTeacher, uses the idea that robustness in the input space model can improve the perturbation of the classifier for noisy labels.
We showcase the boost in predictive power attained using CrowdTeacher for both synthetic and real datasets.
arXiv Detail & Related papers (2021-03-31T15:09:38Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Towards Robustness to Label Noise in Text Classification via Noise
Modeling [7.863638253070439]
Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures.
We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier.
arXiv Detail & Related papers (2021-01-27T05:41:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.