The 'Problem' of Human Label Variation: On Ground Truth in Data,
Modeling and Evaluation
- URL: http://arxiv.org/abs/2211.02570v1
- Date: Fri, 4 Nov 2022 16:38:09 GMT
- Title: The 'Problem' of Human Label Variation: On Ground Truth in Data,
Modeling and Evaluation
- Authors: Barbara Plank
- Abstract summary: We argue that this big open problem of human label variation persists and critically needs more attention to move our field forward.
We reconcile different previously proposed notions of human label variation, provide a repository of publicly-available datasets with un-aggregated labels, depict approaches proposed so far, identify gaps and suggest ways forward.
- Score: 21.513743126525622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human variation in labeling is often considered noise. Annotation projects
for machine learning (ML) aim at minimizing human label variation, with the
assumption to maximize data quality and in turn optimize and maximize machine
learning metrics. However, this conventional practice assumes that there exists
a ground truth, and neglects that there exists genuine human variation in
labeling due to disagreement, subjectivity in annotation or multiple plausible
answers. In this position paper, we argue that this big open problem of human
label variation persists and critically needs more attention to move our field
forward. This is because human label variation impacts all stages of the ML
pipeline: data, modeling and evaluation. However, few works consider all of
these dimensions jointly; and existing research is fragmented. We reconcile
different previously proposed notions of human label variation, provide a
repository of publicly-available datasets with un-aggregated labels, depict
approaches proposed so far, identify gaps and suggest ways forward. As datasets
are becoming increasingly available, we hope that this synthesized view on the
'problem' will lead to an open discussion on possible strategies to devise
fundamentally new directions.
Related papers
- Fine-grained Fallacy Detection with Human Label Variation [6.390923249771241]
We introduce Faina, the first dataset for fallacy detection that embraces multiple plausible answers and natural disagreement.
Fauna includes over 11K span-level annotations with overlaps across 20 fallacy types on social media posts in Italian.
arXiv Detail & Related papers (2025-02-19T16:18:44Z) - "All that Glitters": Approaches to Evaluations with Unreliable Model and Human Annotations [0.0]
"Gold" and "ground truth" human-mediated labels have error.
This study demonstrates methods for answering such questions even in the context of very low reliabilities from expert humans.
arXiv Detail & Related papers (2024-11-23T19:18:08Z) - Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge [51.93909886542317]
We show how *relying on a single aggregate correlation score* can obscure fundamental differences between human labels and those from automatic evaluation.
We propose stratifying data by human label uncertainty to provide a more robust analysis of automatic evaluation performance.
arXiv Detail & Related papers (2024-10-03T03:08:29Z) - Probabilistic Test-Time Generalization by Variational Neighbor-Labeling [62.158807685159736]
This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed on unseen target domains.
Probability pseudo-labeling of target samples to generalize the source-trained model to the target domain at test time.
Variational neighbor labels that incorporate the information of neighboring target samples to generate more robust pseudo labels.
arXiv Detail & Related papers (2023-07-08T18:58:08Z) - Improving Classifier Robustness through Active Generation of Pairwise
Counterfactuals [22.916599410472102]
We present a novel framework that utilizes counterfactual generative models to generate a large number of diverse counterfactuals.
We show that with a small amount of human-annotated counterfactual data (10%), we can generate a counterfactual augmentation dataset with learned labels.
arXiv Detail & Related papers (2023-05-22T23:19:01Z) - Fairness and Bias in Truth Discovery Algorithms: An Experimental
Analysis [7.575734557466221]
Crowd workers may sometimes provide unreliable labels.
Truth discovery (TD) algorithms are applied to determine the consensus labels from conflicting worker responses.
We conduct a systematic study of the bias and fairness of TD algorithms.
arXiv Detail & Related papers (2023-04-25T04:56:35Z) - Self-similarity Driven Scale-invariant Learning for Weakly Supervised
Person Search [66.95134080902717]
We propose a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL)
We introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features.
Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-02-25T04:48:11Z) - Identifiable Latent Causal Content for Domain Adaptation under Latent Covariate Shift [82.14087963690561]
Multi-source domain adaptation (MSDA) addresses the challenge of learning a label prediction function for an unlabeled target domain.
We present an intricate causal generative model by introducing latent noises across domains, along with a latent content variable and a latent style variable.
The proposed approach showcases exceptional performance and efficacy on both simulated and real-world datasets.
arXiv Detail & Related papers (2022-08-30T11:25:15Z) - One Positive Label is Sufficient: Single-Positive Multi-Label Learning
with Label Enhancement [71.9401831465908]
We investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label.
A novel method named proposed, i.e., Single-positive MultI-label learning with Label Enhancement, is proposed.
Experiments on benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-06-01T14:26:30Z) - HOT-VAE: Learning High-Order Label Correlation for Multi-Label
Classification via Attention-Based Variational Autoencoders [8.376771467488458]
High-order Tie-in Variational Autoencoder (HOT-VAE) per-forms adaptive high-order label correlation learning.
We experimentally verify that our model outperforms the existing state-of-the-art approaches on a bird distribution dataset.
arXiv Detail & Related papers (2021-03-09T04:30:28Z) - DomainMix: Learning Generalizable Person Re-Identification Without Human
Annotations [89.78473564527688]
This paper shows how to use labeled synthetic dataset and unlabeled real-world dataset to train a universal model.
In this way, human annotations are no longer required, and it is scalable to large and diverse real-world datasets.
Experimental results show that the proposed annotation-free method is more or less comparable to the counterpart trained with full human annotations.
arXiv Detail & Related papers (2020-11-24T08:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.