The Ecological Fallacy in Annotation: Modelling Human Label Variation
goes beyond Sociodemographics
- URL: http://arxiv.org/abs/2306.11559v1
- Date: Tue, 20 Jun 2023 14:23:32 GMT
- Title: The Ecological Fallacy in Annotation: Modelling Human Label Variation
goes beyond Sociodemographics
- Authors: Matthias Orlikowski (1), Paul R\"ottger (2), Philipp Cimiano (1), Dirk
Hovy (3) ((1) Bielefeld University, (2) University of Oxford, (3) Computing
Sciences Department, Bocconi University, Milan, Italy)
- Abstract summary: Recent research aims to model individual annotator behaviour rather than predicting aggregated labels.
We introduce group-specific layers to multi-annotator models to account for sociodemographics.
This result shows that individual annotation behaviour depends on much more than just sociodemographics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many NLP tasks exhibit human label variation, where different annotators give
different labels to the same texts. This variation is known to depend, at least
in part, on the sociodemographics of annotators. Recent research aims to model
individual annotator behaviour rather than predicting aggregated labels, and we
would expect that sociodemographic information is useful for these models. On
the other hand, the ecological fallacy states that aggregate group behaviour,
such as the behaviour of the average female annotator, does not necessarily
explain individual behaviour. To account for sociodemographics in models of
individual annotator behaviour, we introduce group-specific layers to
multi-annotator models. In a series of experiments for toxic content detection,
we find that explicitly accounting for sociodemographic attributes in this way
does not significantly improve model performance. This result shows that
individual annotation behaviour depends on much more than just
sociodemographics.
Related papers
- Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions [33.76973308687867]
We show that models do improve in sociodemographic prompting when trained.
This performance gain is largely due to models learning annotator-specific behaviour rather than sociodemographic patterns.
Across all tasks, our results suggest that models learn little meaningful connection between sociodemographics and annotation.
arXiv Detail & Related papers (2025-02-28T09:53:42Z) - CAGE: Circumplex Affect Guided Expression Inference [9.108319009019912]
We present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect.
We propose a model for the prediction of facial expressions tailored for lightweight applications.
arXiv Detail & Related papers (2024-04-23T12:30:17Z) - Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks [9.110872603799839]
Supervised classification heavily depends on datasets annotated by humans.
In subjective tasks such as toxicity classification, these annotations often exhibit low agreement among raters.
In this work, we propose textbfAnnotator Awares for Texts (AART) for subjective classification tasks.
arXiv Detail & Related papers (2023-11-16T10:18:32Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Self-similarity Driven Scale-invariant Learning for Weakly Supervised
Person Search [66.95134080902717]
We propose a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL)
We introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features.
Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-02-25T04:48:11Z) - Learning signatures of decision making from many individuals playing the
same game [54.33783158658077]
We design a predictive framework that learns representations to encode an individual's 'behavioral style'
We apply our method to a large-scale behavioral dataset from 1,000 humans playing a 3-armed bandit task.
arXiv Detail & Related papers (2023-02-21T21:41:53Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - SeedBERT: Recovering Annotator Rating Distributions from an Aggregated
Label [43.23903984174963]
We propose SeedBERT, a method for recovering annotator rating distributions from a single label.
Our human evaluations indicate that SeedBERT's attention mechanism is consistent with human sources of annotator disagreement.
arXiv Detail & Related papers (2022-11-23T18:35:15Z) - Incorporating Heterogeneous User Behaviors and Social Influences for
Predictive Analysis [32.31161268928372]
We aim to incorporate heterogeneous user behaviors and social influences for behavior predictions.
This paper proposes a variant of Long-Short Term Memory (LSTM) which can consider context while a behavior sequence.
A residual learning-based decoder is designed to automatically construct multiple high-order cross features based on social behavior representation.
arXiv Detail & Related papers (2022-07-24T17:05:37Z) - The Curious Case of Control [37.28245521206576]
Children make systematic errors on subject control sentences even after they have reached near-adult competence.
We find that models can be categorized by behavior into three separate groups, with broad differences between the groups.
We examine to what degree the models are sensitive to prompting with agent-patient information, finding that raising the salience of agent and patient relations results in significant changes in the outputs of most models.
arXiv Detail & Related papers (2022-05-24T14:45:16Z) - Estimating Structural Disparities for Face Models [54.062512989859265]
In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations.
We explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation.
arXiv Detail & Related papers (2022-04-13T05:30:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.