On Releasing Annotator-Level Labels and Information in Datasets
- URL: http://arxiv.org/abs/2110.05699v1
- Date: Tue, 12 Oct 2021 02:35:45 GMT
- Title: On Releasing Annotator-Level Labels and Information in Datasets
- Authors: Vinodkumar Prabhakaran, Aida Mostafazadeh Davani, Mark D\'iaz
- Abstract summary: We show that label aggregation may introduce representational biases of individual and group perspectives.
We propose recommendations for increased utility and transparency of datasets for downstream use cases.
- Score: 6.546195629698355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A common practice in building NLP datasets, especially using crowd-sourced
annotations, involves obtaining multiple annotator judgements on the same data
instances, which are then flattened to produce a single "ground truth" label or
score, through majority voting, averaging, or adjudication. While these
approaches may be appropriate in certain annotation tasks, such aggregations
overlook the socially constructed nature of human perceptions that annotations
for relatively more subjective tasks are meant to capture. In particular,
systematic disagreements between annotators owing to their socio-cultural
backgrounds and/or lived experiences are often obfuscated through such
aggregations. In this paper, we empirically demonstrate that label aggregation
may introduce representational biases of individual and group perspectives.
Based on this finding, we propose a set of recommendations for increased
utility and transparency of datasets for downstream use cases.
Related papers
- Annotator in the Loop: A Case Study of In-Depth Rater Engagement to Create a Bridging Benchmark Dataset [1.825224193230824]
We describe a novel, collaborative, and iterative annotator-in-the-loop methodology for annotation.
Our findings indicate that collaborative engagement with annotators can enhance annotation methods.
arXiv Detail & Related papers (2024-08-01T19:11:08Z) - Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks [9.110872603799839]
Supervised classification heavily depends on datasets annotated by humans.
In subjective tasks such as toxicity classification, these annotations often exhibit low agreement among raters.
In this work, we propose textbfAnnotator Awares for Texts (AART) for subjective classification tasks.
arXiv Detail & Related papers (2023-11-16T10:18:32Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Re-Examining Human Annotations for Interpretable NLP [80.81532239566992]
We conduct controlled experiments using crowd-sourced websites on two widely used datasets in Interpretable NLP.
We compare the annotation results obtained from recruiting workers satisfying different levels of qualification.
Our results reveal that the annotation quality is highly subject to the workers' qualification, and workers can be guided to provide certain annotations by the instructions.
arXiv Detail & Related papers (2022-04-10T02:27:30Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Dealing with Disagreements: Looking Beyond the Majority Vote in
Subjective Annotations [6.546195629698355]
We investigate the efficacy of multi-annotator models for subjective tasks.
We show that this approach yields same or better performance than aggregating labels in the data prior to training.
Our approach also provides a way to estimate uncertainty in predictions, which we demonstrate better correlate with annotation disagreements than traditional methods.
arXiv Detail & Related papers (2021-10-12T03:12:34Z) - Joint Representation Learning and Novel Category Discovery on Single-
and Multi-modal Data [16.138075558585516]
We present a generic, end-to-end framework to jointly learn a reliable representation and assign clusters to unlabelled data.
We employ Winner-Take-All (WTA) hashing algorithm on the shared representation space to generate pairwise pseudo labels for unlabelled data.
We thoroughly evaluate our framework on large-scale multi-modal video benchmarks Kinetics-400 and VGG-Sound, and image benchmarks CIFAR10, CIFAR100 and ImageNet.
arXiv Detail & Related papers (2021-04-26T15:56:16Z) - Bayesian Semi-supervised Crowdsourcing [71.20185379303479]
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks.
This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
arXiv Detail & Related papers (2020-12-20T23:18:51Z) - Contrastive Examples for Addressing the Tyranny of the Majority [83.93825214500131]
We propose to create a balanced training dataset, consisting of the original dataset plus new data points in which the group memberships are intervened.
We show that current generative adversarial networks are a powerful tool for learning these data points, called contrastive examples.
arXiv Detail & Related papers (2020-04-14T14:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.