Coupled Confusion Correction: Learning from Crowds with Sparse
Annotations
- URL: http://arxiv.org/abs/2312.07331v3
- Date: Tue, 20 Feb 2024 07:30:25 GMT
- Title: Coupled Confusion Correction: Learning from Crowds with Sparse
Annotations
- Authors: Hansong Zhang, Shikun Li, Dan Zeng, Chenggang Yan, Shiming Ge
- Abstract summary: Confusion matrices learned by two models can be corrected by the distilled data from the other.
We cluster the annotator groups'' who share similar expertise so that their confusion matrices could be corrected together.
- Score: 43.94012824749425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the size of the datasets getting larger, accurately annotating such
datasets is becoming more impractical due to the expensiveness on both time and
economy. Therefore, crowd-sourcing has been widely adopted to alleviate the
cost of collecting labels, which also inevitably introduces label noise and
eventually degrades the performance of the model. To learn from crowd-sourcing
annotations, modeling the expertise of each annotator is a common but
challenging paradigm, because the annotations collected by crowd-sourcing are
usually highly-sparse. To alleviate this problem, we propose Coupled Confusion
Correction (CCC), where two models are simultaneously trained to correct the
confusion matrices learned by each other. Via bi-level optimization, the
confusion matrices learned by one model can be corrected by the distilled data
from the other. Moreover, we cluster the ``annotator groups'' who share similar
expertise so that their confusion matrices could be corrected together. In this
way, the expertise of the annotators, especially of those who provide seldom
labels, could be better captured. Remarkably, we point out that the annotation
sparsity not only means the average number of labels is low, but also there are
always some annotators who provide very few labels, which is neglected by
previous works when constructing synthetic crowd-sourcing annotations. Based on
that, we propose to use Beta distribution to control the generation of the
crowd-sourcing labels so that the synthetic annotations could be more
consistent with the real-world ones. Extensive experiments are conducted on two
types of synthetic datasets and three real-world datasets, the results of which
demonstrate that CCC significantly outperforms state-of-the-art approaches.
Source codes are available at: https://github.com/Hansong-Zhang/CCC.
Related papers
- Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Improving Classifier Robustness through Active Generation of Pairwise
Counterfactuals [22.916599410472102]
We present a novel framework that utilizes counterfactual generative models to generate a large number of diverse counterfactuals.
We show that with a small amount of human-annotated counterfactual data (10%), we can generate a counterfactual augmentation dataset with learned labels.
arXiv Detail & Related papers (2023-05-22T23:19:01Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - Improve Learning from Crowds via Generative Augmentation [36.38523364192051]
Crowdsourcing provides an efficient label collection schema for supervised machine learning.
To control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators.
This creates a sparsity issue and limits the quality of machine learning models trained on such data.
arXiv Detail & Related papers (2021-07-22T04:14:30Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - CrowdTeacher: Robust Co-teaching with Noisy Answers & Sample-specific
Perturbations for Tabular Data [8.276156981100364]
Co-teaching methods have shown promising improvements for computer vision problems with noisy labels.
Our model, CrowdTeacher, uses the idea that robustness in the input space model can improve the perturbation of the classifier for noisy labels.
We showcase the boost in predictive power attained using CrowdTeacher for both synthetic and real datasets.
arXiv Detail & Related papers (2021-03-31T15:09:38Z) - OpinionRank: Extracting Ground Truth Labels from Unreliable Expert
Opinions with Graph-Based Spectral Ranking [2.1930130356902207]
crowdsourcing has emerged as a popular, inexpensive, and efficient data mining solution for performing distributed label collection.
We propose OpinionRank, a model-free, interpretable, graph-based spectral algorithm for integrating crowdsourced annotations into reliable labels.
Our experiments show that OpinionRank performs favorably when compared against more highly parameterized algorithms.
arXiv Detail & Related papers (2021-02-11T08:12:44Z) - Bayesian Semi-supervised Crowdsourcing [71.20185379303479]
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks.
This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
arXiv Detail & Related papers (2020-12-20T23:18:51Z) - End-to-End Learning from Noisy Crowd to Supervised Machine Learning
Models [6.278267504352446]
We advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data.
We show how label aggregation can benefit from estimating the annotators' confusion matrix to improve the learning process.
We demonstrate the effectiveness of our strategies on several image datasets, using SVM and deep neural networks.
arXiv Detail & Related papers (2020-11-13T09:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.