Related papers: Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems

Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems

URL: http://arxiv.org/abs/2506.20209v1
Date: Wed, 25 Jun 2025 07:53:36 GMT
Title: Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems
Authors: Benedetta Muscato, Lucia Passaro, Gizem Gezici, Fosca Giannotti,
Abstract summary: This study proposes a new multi-perspective approach using soft labels to encourage the development of perspective aware models.<n>We conduct an analysis across diverse subjective text classification tasks, including hate speech, irony, abusive language, and stance detection.<n>Results show that the multi-perspective approach better approximates human label distributions, as measured by Jensen-Shannon Divergence (JSD)<n>Our approach exhibits lower confidence in tasks like irony and stance detection, likely due to the inherent subjectivity present in the texts.
Score: 3.011820285006942
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the realm of Natural Language Processing (NLP), common approaches for handling human disagreement consist of aggregating annotators' viewpoints to establish a single ground truth. However, prior studies show that disregarding individual opinions can lead can lead to the side effect of underrepresenting minority perspectives, especially in subjective tasks, where annotators may systematically disagree because of their preferences. Recognizing that labels reflect the diverse backgrounds, life experiences, and values of individuals, this study proposes a new multi-perspective approach using soft labels to encourage the development of the next generation of perspective aware models, more inclusive and pluralistic. We conduct an extensive analysis across diverse subjective text classification tasks, including hate speech, irony, abusive language, and stance detection, to highlight the importance of capturing human disagreements, often overlooked by traditional aggregation methods. Results show that the multi-perspective approach not only better approximates human label distributions, as measured by Jensen-Shannon Divergence (JSD), but also achieves superior classification performance (higher F1 scores), outperforming traditional approaches. However, our approach exhibits lower confidence in tasks like irony and stance detection, likely due to the inherent subjectivity present in the texts. Lastly, leveraging Explainable AI (XAI), we explore model uncertainty and uncover meaningful insights into model predictions.

Related papers

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [53.18562650350898]
Chain-of-thought (CoT) reasoning enhances performance of large language models.<n>We present the first comprehensive study of CoT faithfulness in large vision-language models.
arXiv Detail & Related papers (2025-05-29T18:55:05Z)
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models [121.03333569013148]
We introduce VisuLogic: a benchmark of 1,000 human-verified problems across six categories.<n>These types of questions can be evaluated to assess the visual reasoning capabilities of MLLMs from multiple perspectives.<n>Most models score below 30% accuracy-only slightly above the 25% random baseline and far below the 51.4% achieved by humans.
arXiv Detail & Related papers (2025-04-21T17:59:53Z)
Embracing Diversity: A Multi-Perspective Approach with Soft Labels [3.529000007777341]
We propose a new framework for designing perspective-aware models on stance detection task, in which multiple annotators assign stances based on a controversial topic.<n>Results show that the multi-perspective approach yields better classification performance (higher F1-scores)
arXiv Detail & Related papers (2025-03-01T13:33:38Z)
Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation [29.44609627447293]
We propose an approach to summary faithfulness evaluation in which multiple agents are assigned initial stances.<n>We introduce a new dimension, ambiguity, and a detailed taxonomy to identify such special cases.<n>Experiments demonstrate our approach can help identify ambiguities, and have even a stronger performance on non-ambiguous summaries.
arXiv Detail & Related papers (2025-02-12T15:46:50Z)
Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media [12.479554210753664]
This study adopts a generative approach, where stance predictions include explicit, interpretable rationales.<n>We find that incorporating reasoning into stance detection enables the smaller model (FlanT5) to outperform GPT-3.5's zero-shot performance.
arXiv Detail & Related papers (2024-12-13T16:34:39Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Multi-Perspective Stance Detection [2.8073184910275293]
Multi-perspective approach yields better classification performance than the baseline which uses the single label. This entails that designing more inclusive perspective-aware AI models is not only an essential first step in implementing responsible and ethical AI, but it can also achieve superior results than using the traditional approaches.
arXiv Detail & Related papers (2024-11-13T16:30:41Z)
Uncovering Biases with Reflective Large Language Models [2.5200794639628032]
Biases and errors in human-labeled data present significant challenges for machine learning. We present the Reflective LLM Dialogue Framework RLDF, which leverages structured adversarial dialogues to uncover diverse perspectives. Experiments show RLDF successfully identifies potential biases in public content while exposing limitations in human-labeled data.
arXiv Detail & Related papers (2024-08-24T04:48:32Z)
Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models [53.337728969143086]
Recommendation systems harness user-item interactions like clicks and reviews to learn their representations. Previous studies improve recommendation accuracy and interpretability by modeling user preferences across various aspects and intents. We introduce a chain-based prompting approach to uncover semantic aspect-aware interactions.
arXiv Detail & Related papers (2023-12-26T15:44:09Z)
Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment. We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections. We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z)
Few-shot Forgery Detection via Guided Adversarial Interpolation [56.59499187594308]
Existing forgery detection methods suffer from significant performance drops when applied to unseen novel forgery approaches. We propose Guided Adversarial Interpolation (GAI) to overcome the few-shot forgery detection problem. Our method is validated to be robust to choices of majority and minority forgery approaches.
arXiv Detail & Related papers (2022-04-12T16:05:10Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.