Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks
- URL: http://arxiv.org/abs/2311.09743v2
- Date: Thu, 16 May 2024 06:01:26 GMT
- Title: Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks
- Authors: Negar Mokhberian, Myrl G. Marmarelis, Frederic R. Hopp, Valerio Basile, Fred Morstatter, Kristina Lerman,
- Abstract summary: Supervised classification heavily depends on datasets annotated by humans.
In subjective tasks such as toxicity classification, these annotations often exhibit low agreement among raters.
In this work, we propose textbfAnnotator Awares for Texts (AART) for subjective classification tasks.
- Score: 9.110872603799839
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised classification heavily depends on datasets annotated by humans. However, in subjective tasks such as toxicity classification, these annotations often exhibit low agreement among raters. Annotations have commonly been aggregated by employing methods like majority voting to determine a single ground truth label. In subjective tasks, aggregating labels will result in biased labeling and, consequently, biased models that can overlook minority opinions. Previous studies have shed light on the pitfalls of label aggregation and have introduced a handful of practical approaches to tackle this issue. Recently proposed multi-annotator models, which predict labels individually per annotator, are vulnerable to under-determination for annotators with few samples. This problem is exacerbated in crowdsourced datasets. In this work, we propose \textbf{Annotator Aware Representations for Texts (AART)} for subjective classification tasks. Our approach involves learning representations of annotators, allowing for exploration of annotation behaviors. We show the improvement of our method on metrics that assess the performance on capturing individual annotators' perspectives. Additionally, we demonstrate fairness metrics to evaluate our model's equability of performance for marginalized annotators compared to others.
Related papers
- Leveraging Annotator Disagreement for Text Classification [3.6625157427847963]
It is common practice in text classification to only use one majority label for model training even if a dataset has been annotated by multiple annotators.
This paper proposes three strategies to leverage annotator disagreement for text classification: a probability-based multi-label method, an ensemble system, and instruction tuning.
arXiv Detail & Related papers (2024-09-26T06:46:53Z) - ACTOR: Active Learning with Annotator-specific Classification Heads to
Embrace Human Label Variation [35.10805667891489]
Active learning, as an annotation cost-saving strategy, has not been fully explored in the context of learning from disagreement.
We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation.
arXiv Detail & Related papers (2023-10-23T14:26:43Z) - IDEAL: Influence-Driven Selective Annotations Empower In-Context
Learners in Large Language Models [66.32043210237768]
This paper introduces an influence-driven selective annotation method.
It aims to minimize annotation costs while improving the quality of in-context examples.
Experiments confirm the superiority of the proposed method on various benchmarks.
arXiv Detail & Related papers (2023-10-16T22:53:54Z) - Subjective Crowd Disagreements for Subjective Data: Uncovering
Meaningful CrowdOpinion with Population-level Learning [8.530934084017966]
We introduce emphCrowdOpinion, an unsupervised learning approach that uses language features and label distributions to pool similar items into larger samples of label distributions.
We use five publicly available benchmark datasets (with varying levels of annotator disagreements) from social media.
We also experiment in the wild using a dataset from Facebook, where annotations come from the platform itself by users reacting to posts.
arXiv Detail & Related papers (2023-07-07T22:09:46Z) - Using Natural Language Explanations to Rescale Human Judgments [81.66697572357477]
We propose a method to rescale ordinal annotations and explanations using large language models (LLMs)
We feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric.
Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
arXiv Detail & Related papers (2023-05-24T06:19:14Z) - SeedBERT: Recovering Annotator Rating Distributions from an Aggregated
Label [43.23903984174963]
We propose SeedBERT, a method for recovering annotator rating distributions from a single label.
Our human evaluations indicate that SeedBERT's attention mechanism is consistent with human sources of annotator disagreement.
arXiv Detail & Related papers (2022-11-23T18:35:15Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - Disjoint Contrastive Regression Learning for Multi-Sourced Annotations [10.159313152511919]
Large-scale datasets are important for the development of deep learning models.
Multiple annotators may be employed to label different subsets of the data.
The inconsistency and bias among different annotators are harmful to the model training.
arXiv Detail & Related papers (2021-12-31T12:39:04Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Debiased Contrastive Learning [64.98602526764599]
We develop a debiased contrastive objective that corrects for the sampling of same-label datapoints.
Empirically, the proposed objective consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks.
arXiv Detail & Related papers (2020-07-01T04:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.