Related papers: PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments

PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments

URL: http://arxiv.org/abs/2602.08716v1
Date: Mon, 09 Feb 2026 14:25:07 GMT
Title: PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments
Authors: Shangrui Nie, Kian Omoomi, Lucie Flek, Zhixue Zhao, Charles Welch,
Abstract summary: PERSPECTRA is a pluralist benchmark for evaluating how well models represent, distinguish, and reason over multiple perspectives.<n>We construct 3,810 enriched arguments spanning 762 pro/con stances on 100 controversial topics.<n>Each opinion is expanded to multiple naturalistic variants, enabling robust evaluation of pluralism.
Score: 16.8677147128948
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pluralism, the capacity to engage with diverse perspectives without collapsing them into a single viewpoint, is critical for developing large language models that faithfully reflect human heterogeneity. Yet this characteristic has not been carefully examined in the LLM research community and remains absent from most alignment studies. Debate-oriented sources provide a natural entry point for pluralism research. Previous work builds on online debate sources but remains constrained by costly human validation. Other debate-rich platforms such as Reddit and Kialo also offer promising material: Reddit provides linguistic diversity and scale but lacks clear argumentative structure, while Kialo supplies explicit pro/con graphs but remains overly concise and detached from natural discourse. We introduce PERSPECTRA, a pluralist benchmark that integrates the structural clarity of Kialo debate graphs with the linguistic diversity of real Reddit discussions. Using a controlled retrieval-and-expansion pipeline, we construct 3,810 enriched arguments spanning 762 pro/con stances on 100 controversial topics. Each opinion is expanded to multiple naturalistic variants, enabling robust evaluation of pluralism. We initialise three tasks with PERSPECTRA: opinion counting (identifying distinct viewpoints), opinion matching (aligning supporting stances and discourse to source opinions), and polarity check (inferring aggregate stance in mixed discourse). Experiments with state-of-the-art open-source and proprietary LLMs, highlight systematic failures, such as overestimating the number of viewpoints and misclassifying concessive structures, underscoring the difficulty of pluralism-aware understanding and reasoning. By combining diversity with structure, PERSPECTRA establishes the first scalable, configurable benchmark for evaluating how well models represent, distinguish, and reason over multiple perspectives.

Related papers

Explain Before You Answer: A Survey on Compositional Visual Reasoning [74.27548620675748]
Compositional visual reasoning has emerged as a key research frontier in multimodal AI.<n>This survey systematically reviews 260+ papers from top venues (CVPR, ICCV, NeurIPS, ICML, ACL, etc.)<n>We then catalog 60+ benchmarks and corresponding metrics that probe compositional visual reasoning along dimensions such as grounding accuracy, chain-of-thought faithfulness, and high-resolution perception.
arXiv Detail & Related papers (2025-08-24T11:01:51Z)
Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems [3.011820285006942]
This study proposes a new multi-perspective approach using soft labels to encourage the development of perspective aware models.<n>We conduct an analysis across diverse subjective text classification tasks, including hate speech, irony, abusive language, and stance detection.<n>Results show that the multi-perspective approach better approximates human label distributions, as measured by Jensen-Shannon Divergence (JSD)<n>Our approach exhibits lower confidence in tasks like irony and stance detection, likely due to the inherent subjectivity present in the texts.
arXiv Detail & Related papers (2025-06-25T07:53:36Z)
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought [83.89629325805505]
We introduce Argus to address limitations with a new visual attention grounding mechanism.<n>Our approach employs object-centric grounding as visual chain-of-thought signals, enabling more effective goal-conditioned visual attention.
arXiv Detail & Related papers (2025-05-29T17:59:56Z)
Debating for Better Reasoning: An Unsupervised Multimodal Approach [56.74157117060815]
We extend the debate paradigm to a multimodal setting, exploring its potential for weaker models to supervise and enhance the performance of stronger models.<n>We focus on visual question answering (VQA), where two "sighted" expert vision-language models debate an answer, while a "blind" (text-only) judge adjudicates based solely on the quality of the arguments.<n>In our framework, the experts defend only answers aligned with their beliefs, thereby obviating the need for explicit role-playing and concentrating the debate on instances of expert disagreement.
arXiv Detail & Related papers (2025-05-20T17:18:17Z)
Persona Knowledge-Aligned Prompt Tuning Method for Online Debate [42.28019112668135]
We propose a persona knowledge-aligned framework for argument quality assessment tasks from the audience side. This is the first work that leverages the emergence of ChatGPT and injects audience personae knowledge into smaller language models via prompt tuning.
arXiv Detail & Related papers (2024-10-05T17:33:11Z)
Overview of PerpectiveArg2024: The First Shared Task on Perspective Argument Retrieval [56.66761232081188]
We present a novel dataset covering demographic and socio-cultural (socio) variables, such as age, gender, and political attitude, representing minority and majority groups in society. We find substantial challenges in incorporating perspectivism, especially when aiming for personalization based solely on the text of arguments without explicitly providing socio profiles. While we bootstrap perspective argument retrieval, further research is essential to optimize retrieval systems to facilitate personalization and reduce polarization.
arXiv Detail & Related papers (2024-07-29T03:14:57Z)
Debate-to-Write: A Persona-Driven Multi-Agent Framework for Diverse Argument Generation [25.43678472601801]
We propose a persona-based multi-agent framework for argument writing.<n>Inspired by the human debate, we first assign each agent a persona representing its high-level beliefs from a unique perspective.<n>We then design an agent interaction process so that the agents can collaboratively debate and discuss the idea to form an overall plan for argument writing.
arXiv Detail & Related papers (2024-06-28T04:21:20Z)
"Reasoning" with Rhetoric: On the Style-Evidence Tradeoff in LLM-Generated Counter-Arguments [11.243184875465788]
Large language models (LLMs) play a key role in generating evidence-based and stylistic counter-arguments.<n>Previous research often neglects the balance between evidentiality and style, which are crucial for persuasive arguments.<n>We evaluated the effectiveness of stylized evidence-based counter-argument generation in Counterfire, a new dataset of 38,000 counter-arguments.
arXiv Detail & Related papers (2024-02-13T14:53:12Z)
Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation [62.069374456021016]
We present the ArgTersely benchmark for sentence-level counter-argument generation. We also propose Arg-LlaMA for generating high-quality counter-argument.
arXiv Detail & Related papers (2023-12-21T06:51:34Z)
How Far Can We Extract Diverse Perspectives from Large Language Models? [16.16678226707335]
We show that large language models (LLMs) can generate diverse perspectives on subjective topics. We propose a criteria-based prompting technique to ground diverse opinions. Our methods, applied to various tasks, show that LLMs can indeed produce diverse opinions according to the degree of task subjectivity.
arXiv Detail & Related papers (2023-11-16T11:23:38Z)
ERTIM@MC2: Diversified Argumentative Tweets Retrieval [0.0]
It consists in detecting the most argumentative and diverse Tweets about some festivals in English and French from a massive multilingual collection. An initial step filters the original dataset to fit the language and topic requirements of the task. The final step extracts the most diverse arguments by clustering Tweets according to their textual content and selecting the most argumentative ones from each cluster.
arXiv Detail & Related papers (2023-04-17T08:06:17Z)
Aspect-Controlled Neural Argument Generation [65.91772010586605]
We train a language model for argument generation that can be controlled on a fine-grained level to generate sentence-level arguments for a given topic, stance, and aspect. Our evaluation shows that our generation model is able to generate high-quality, aspect-specific arguments. These arguments can be used to improve the performance of stance detection models via data augmentation and to generate counter-arguments.
arXiv Detail & Related papers (2020-04-30T20:17:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.