Beyond Consensus: Perspectivist Modeling and Evaluation of Annotator Disagreement in NLP
- URL: http://arxiv.org/abs/2601.09065v2
- Date: Sat, 17 Jan 2026 17:58:09 GMT
- Title: Beyond Consensus: Perspectivist Modeling and Evaluation of Annotator Disagreement in NLP
- Authors: Yinuo Xu, David Jurgens,
- Abstract summary: Annotator disagreement is widespread in NLP, particularly for subjective and ambiguous tasks such as toxicity detection and stance analysis.<n>We first present a domain-agnostic taxonomy of the sources of disagreement spanning data, task, and annotator factors.<n>We then synthesize modeling approaches using a common framework defined by prediction targets and pooling structure.
- Score: 25.097081181685613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Annotator disagreement is widespread in NLP, particularly for subjective and ambiguous tasks such as toxicity detection and stance analysis. While early approaches treated disagreement as noise to be removed, recent work increasingly models it as a meaningful signal reflecting variation in interpretation and perspective. This survey provides a unified view of disagreement-aware NLP methods. We first present a domain-agnostic taxonomy of the sources of disagreement spanning data, task, and annotator factors. We then synthesize modeling approaches using a common framework defined by prediction targets and pooling structure, highlighting a shift from consensus learning toward explicitly modeling disagreement, and toward capturing structured relationships among annotators. We review evaluation metrics for both predictive performance and annotator behavior, and noting that most fairness evaluations remain descriptive rather than normative. We conclude by identifying open challenges and future directions, including integrating multiple sources of variation, developing disagreement-aware interpretability frameworks, and grappling with the practical tradeoffs of perspectivist modeling.
Related papers
- Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models [72.4149653187766]
We propose a Reasoner-Verifier framework named Adrialversa Reasoning RAG (ARR)<n>The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage.<n> Experiments on multiple benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2026-01-08T06:57:03Z) - TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models [9.607579442309639]
We introduce TRACE, a framework for Transparent Reasoning And Consistency Evaluation.<n>At its core, TRACEleverages Auxiliary Reasoning Sets to decompose complex problems.<n>Our experiments show that consistency across ARS correlates with final-answer correctness.<n>TRACE defines confidence regions that distinguish reliable from unreliable reasoning paths.
arXiv Detail & Related papers (2025-12-05T18:40:18Z) - A Unified Evaluation Framework for Multi-Annotator Tendency Learning [6.801084054135531]
We propose the first unified evaluation framework with two novel metrics.<n> Difference of Inter-annotator Consistency (DIC) quantifies how well models capture annotator tendencies.<n> Behavior Alignment Explainability (BAE) evaluates how well model explanations reflect annotator behavior and decision relevance.
arXiv Detail & Related papers (2025-08-14T06:50:20Z) - Fair Deepfake Detectors Can Generalize [51.21167546843708]
We show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions.<n>Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals.<n>DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art
arXiv Detail & Related papers (2025-07-03T14:10:02Z) - A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [58.32070787537946]
Chain-of-thought (CoT) reasoning enhances performance of large language models.<n>We present the first comprehensive study of CoT faithfulness in large vision-language models.
arXiv Detail & Related papers (2025-05-29T18:55:05Z) - A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement [7.031062446301277]
We introduce a novel content moderation framework that emphasizes the importance of capturing annotation disagreement.<n>We leverage uncertainty estimation techniques, specifically Conformal Prediction, to account for both the ambiguity in comment annotations and the model's inherent uncertainty in predicting toxicity and disagreement.
arXiv Detail & Related papers (2024-11-06T18:08:57Z) - Estimating the Causal Effects of Natural Logic Features in Transformer-Based NLI Models [16.328341121232484]
We apply causal effect estimation strategies to measure the effect of context interventions.
We investigate robustness to irrelevant changes and sensitivity to impactful changes of Transformers.
arXiv Detail & Related papers (2024-04-03T10:22:35Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.