Related papers: Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates

Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates

URL: http://arxiv.org/abs/2412.04629v5
Date: Mon, 15 Sep 2025 19:48:35 GMT
Title: Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates
Authors: Li Shi, Houjiang Liu, Yian Wong, Utkarsh Mujumdar, Dan Zhang, Jacek Gwizdka, Matthew Lease,
Abstract summary: Multi-persona debate systems powered by large language models (LLMs) show promise in reducing confirmation bias.<n>We compare an LLM-based multi-persona debate system with a two-stance retrieval-based system, exposing participants to multiple viewpoints on controversial topics.<n>Our results show that while the debate system does not significantly increase attention to opposing views, it does provide a buffering effect against bias caused by individual cognitive tendency.
Score: 8.288230743741947
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Multi-persona debate systems powered by large language models (LLMs) show promise in reducing confirmation bias, which can fuel echo chambers and social polarization. However, empirical evidence remains limited on whether they meaningfully shift user attention toward belief-challenging content, promote belief change, or outperform traditional debiasing strategies. To investigate this, we compare an LLM-based multi-persona debate system with a two-stance retrieval-based system, exposing participants to multiple viewpoints on controversial topics. By collecting eye-tracking data, belief change measures, and qualitative feedback, our results show that while the debate system does not significantly increase attention to opposing views, or make participants shift away from prior beliefs, it does provide a buffering effect against bias caused by individual cognitive tendency. These findings shed light on both the promise and limits of multi-persona debate systems in information seeking, and we offer design insights to guide future work toward more balanced and reflective information engagement.

Related papers

Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias [24.182306712604966]
Large language models (LLMs) are highly vulnerable to input confirmation bias.<n>MoLaCE is a lightweight inference-time framework that addresses confirmation bias by mixing experts instantiated as different activation strengths.<n>We empirically show that it consistently reduces confirmation bias, improves robustness, and surpasses multi-agent debate.
arXiv Detail & Related papers (2025-12-29T14:52:34Z)
MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion [73.99171322670772]
Large Vision-Language Models (LVLMs) are increasingly deployed in domains such as shopping, health, and news.<n> MMPersuade provides a unified framework for systematically studying multimodal persuasion dynamics in LVLMs.
arXiv Detail & Related papers (2025-10-26T17:39:21Z)
The Hunger Game Debate: On the Emergence of Over-Competition in Multi-Agent Systems [90.96738882568224]
This paper investigates the over-competition in multi-agent debate, where agents under extreme pressure exhibit unreliable, harmful behaviors.<n>To study this phenomenon, we propose HATE, a novel experimental framework that simulates debates under a zero-sum competition arena.
arXiv Detail & Related papers (2025-09-30T11:44:47Z)
Enhancing Multi-Agent Debate System Performance via Confidence Expression [55.34012400580016]
Multi-Agent Debate (MAD) systems simulate human debate and thereby improve task performance.<n>Some Large Language Models (LLMs) possess superior knowledge or reasoning capabilities for specific tasks, but struggle to clearly communicate this advantage during debates.<n>Inappropriate confidence expression can cause agents in MAD systems to either stubbornly maintain incorrect beliefs or converge prematurely on suboptimal answers.<n>We develop ConfMAD, a MAD framework that integrates confidence expression throughout the debate process.
arXiv Detail & Related papers (2025-09-17T14:34:27Z)
MV-Debate: Multi-view Agent Debate with Dynamic Reflection Gating for Multimodal Harmful Content Detection in Social Media [26.07883439550861]
MV-Debate is a multi-view agent debate framework with dynamic reflection gating for unified multimodal harmful content detection.<n>MV-Debate assembles four complementary debate agents, a surface analyst, a deep reasoner, a modality contrast, and a social contextualist, to analyze content from diverse interpretive perspectives.
arXiv Detail & Related papers (2025-08-07T16:38:25Z)
Debating for Better Reasoning: An Unsupervised Multimodal Approach [56.74157117060815]
We extend the debate paradigm to a multimodal setting, exploring its potential for weaker models to supervise and enhance the performance of stronger models.<n>We focus on visual question answering (VQA), where two "sighted" expert vision-language models debate an answer, while a "blind" (text-only) judge adjudicates based solely on the quality of the arguments.<n>In our framework, the experts defend only answers aligned with their beliefs, thereby obviating the need for explicit role-playing and concentrating the debate on instances of expert disagreement.
arXiv Detail & Related papers (2025-05-20T17:18:17Z)
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning [151.4060202671114]
multimodal large language models (MLLMs) have shown unprecedented capabilities in advancing vision-language tasks. This paper introduces a novel bottom-up reasoning framework to address hallucinations in MLLMs. Our framework systematically addresses potential issues in both visual and textual inputs by verifying and integrating perception-level information with cognition-level commonsense knowledge.
arXiv Detail & Related papers (2024-12-15T09:10:46Z)
Can Users Detect Biases or Factual Errors in Generated Responses in Conversational Information-Seeking? [13.790574266700006]
We investigate the limitations of response generation in conversational information-seeking systems. The study addresses the problem of query answerability and the challenge of response incompleteness. Our analysis reveals that it is easier for users to detect response incompleteness than query answerability.
arXiv Detail & Related papers (2024-10-28T20:55:00Z)
Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs [50.40165119718928]
LongPiBench is a benchmark designed to assess positional bias involving multiple pieces of relevant information. These experiments reveal that while most current models are robust against the "lost in the middle" issue, there exist significant biases related to the spacing of relevant information pieces.
arXiv Detail & Related papers (2024-10-18T17:41:19Z)
Bias in the Mirror: Are LLMs opinions robust to their own adversarial attacks ? [22.0383367888756]
Large language models (LLMs) inherit biases from their training data and alignment processes, influencing their responses in subtle ways. We introduce a novel approach where two instances of an LLM engage in self-debate, arguing opposing viewpoints to persuade a neutral version of the model. We evaluate how firmly biases hold and whether models are susceptible to reinforcing misinformation or shifting to harmful viewpoints.
arXiv Detail & Related papers (2024-10-17T13:06:02Z)
Cognitive Biases in Large Language Models for News Recommendation [68.90354828533535]
This paper explores the potential impact of cognitive biases on large language models (LLMs) based news recommender systems. We discuss strategies to mitigate these biases through data augmentation, prompt engineering and learning algorithms aspects.
arXiv Detail & Related papers (2024-10-03T18:42:07Z)
Towards Detecting and Mitigating Cognitive Bias in Spoken Conversational Search [14.916529791823868]
This paper draws upon insights from information seeking, psychology, cognitive science, and wearable sensors to provoke novel conversations in the community. We propose a framework including multimodal instruments and methods for experimental designs and settings.
arXiv Detail & Related papers (2024-05-21T03:50:32Z)
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback. The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied. We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z)
Cognitive Bias in Decision-Making with LLMs [19.87475562475802]
Large language models (LLMs) offer significant potential as tools to support an expanding range of decision-making tasks. LLMs have been shown to inherit societal biases against protected groups, as well as be subject to bias functionally resembling cognitive bias. Our work introduces BiasBuster, a framework designed to uncover, evaluate, and mitigate cognitive bias in LLMs.
arXiv Detail & Related papers (2024-02-25T02:35:56Z)
LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation [58.524237916836164]
We propose LEMMA: LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation. Our method improves the accuracy over the top baseline LVLM by 7% and 13% on Twitter and Fakeddit datasets respectively.
arXiv Detail & Related papers (2024-02-19T08:32:27Z)
Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking [49.02867094432589]
Large language models (LLMs) powered conversational search systems have already been used by hundreds of millions of people. We investigate whether and how LLMs with opinion biases that either reinforce or challenge the user's view change the effect.
arXiv Detail & Related papers (2024-02-08T18:14:33Z)
Fostering User Engagement in the Critical Reflection of Arguments [3.26297440422721]
We propose a system that engages in a deliberative dialogue with a human. We enable the system to intervene if the user is too focused on their pre-existing opinion. We report on a user study with 58 participants to test our model and the effect of the intervention mechanism.
arXiv Detail & Related papers (2023-08-17T15:48:23Z)
Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments. It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z)
Persua: A Visual Interactive System to Enhance the Persuasiveness of Arguments in Online Discussion [52.49981085431061]
Enhancing people's ability to write persuasive arguments could contribute to the effectiveness and civility in online communication. We derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions. Persua is an interactive visual system that provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments.
arXiv Detail & Related papers (2022-04-16T08:07:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.