Utilizing supervised models to infer consensus labels and their quality
from data with multiple annotators
- URL: http://arxiv.org/abs/2210.06812v1
- Date: Thu, 13 Oct 2022 07:54:07 GMT
- Title: Utilizing supervised models to infer consensus labels and their quality
from data with multiple annotators
- Authors: Hui Wen Goh, Ulyana Tkachenko, Jonas Mueller
- Abstract summary: Real-world data for classification is often labeled by multiple annotators.
We introduce CROWDLAB, a straightforward approach to estimate such data.
Our proposed method provides superior estimates for (1)- (3) than many alternative algorithms.
- Score: 16.79939549201032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world data for classification is often labeled by multiple annotators.
For analyzing such data, we introduce CROWDLAB, a straightforward approach to
estimate: (1) A consensus label for each example that aggregates the individual
annotations (more accurately than aggregation via majority-vote or other
algorithms used in crowdsourcing); (2) A confidence score for how likely each
consensus label is correct (via well-calibrated estimates that account for the
number of annotations for each example and their agreement,
prediction-confidence from a trained classifier, and trustworthiness of each
annotator vs. the classifier); (3) A rating for each annotator quantifying the
overall correctness of their labels. While many algorithms have been proposed
to estimate related quantities in crowdsourcing, these often rely on
sophisticated generative models with iterative inference schemes, whereas
CROWDLAB is based on simple weighted ensembling. Many algorithms also rely
solely on annotator statistics, ignoring the features of the examples from
which the annotations derive. CROWDLAB in contrast utilizes any classifier
model trained on these features, which can generalize between examples with
similar features. In evaluations on real-world multi-annotator image data, our
proposed method provides superior estimates for (1)-(3) than many alternative
algorithms.
Related papers
- Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration [60.95748658638956]
This paper introduces the Multi-Label Confidence task, aiming to provide well-calibrated confidence scores in multi-label scenarios.
Existing single-label calibration methods fail to account for category correlations, which are crucial for addressing semantic confusion.
We propose the Dynamic Correlation Learning and Regularization algorithm, which leverages multi-grained semantic correlations to better model semantic confusion.
arXiv Detail & Related papers (2024-07-09T13:26:21Z) - Memory Consistency Guided Divide-and-Conquer Learning for Generalized
Category Discovery [56.172872410834664]
Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning.
We propose a Memory Consistency guided Divide-and-conquer Learning framework (MCDL)
Our method outperforms state-of-the-art models by a large margin on both seen and unseen classes of the generic image recognition.
arXiv Detail & Related papers (2024-01-24T09:39:45Z) - Crowd-Certain: Label Aggregation in Crowdsourced and Ensemble Learning
Classification [0.0]
We introduce Crowd-Certain, a novel approach for label aggregation in crowdsourced and ensemble learning classification tasks.
The proposed method uses the consistency of the annotators versus a trained classifier to determine a reliability score for each annotator.
We extensively evaluated our approach against ten existing techniques across ten different datasets, each labeled by varying numbers of annotators.
arXiv Detail & Related papers (2023-10-25T01:58:37Z) - ACTOR: Active Learning with Annotator-specific Classification Heads to
Embrace Human Label Variation [35.10805667891489]
Active learning, as an annotation cost-saving strategy, has not been fully explored in the context of learning from disagreement.
We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation.
arXiv Detail & Related papers (2023-10-23T14:26:43Z) - CEREAL: Few-Sample Clustering Evaluation [4.569028973407756]
We focus on the underexplored problem of estimating clustering quality with limited labels.
We introduce CEREAL, a comprehensive framework for few-sample clustering evaluation.
Our results show that CEREAL reduces the area under the absolute error curve by up to 57% compared to the best sampling baseline.
arXiv Detail & Related papers (2022-09-30T19:52:41Z) - Evolving Multi-Label Fuzzy Classifier [5.53329677986653]
Multi-label classification has attracted much attention in the machine learning community to address the problem of assigning single samples to more than one class at the same time.
We propose an evolving multi-label fuzzy classifier (EFC-ML) which is able to self-adapt and self-evolve its structure with new incoming multi-label samples in an incremental, single-pass manner.
arXiv Detail & Related papers (2022-03-29T08:01:03Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Single versus Multiple Annotation for Named Entity Recognition of
Mutations [4.213427823201119]
We address the impact of using a single annotator vs two annotators, in order to measure whether multiple annotators are required.
Once we evaluate the performance loss when using a single annotator, we apply different methods to sample the training data for second annotation.
We use held-out double-annotated data to build two scenarios with different types of rankings: similarity-based and confidence based.
We evaluate both approaches on: (i) their ability to identify training instances that are erroneous, and (ii) on Mutation NER performance for state-of-the-art
arXiv Detail & Related papers (2021-01-19T03:54:17Z) - Bayesian Semi-supervised Crowdsourcing [71.20185379303479]
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks.
This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
arXiv Detail & Related papers (2020-12-20T23:18:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.