Related papers: AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs

AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs

URL: http://arxiv.org/abs/2510.13912v2
Date: Tue, 21 Oct 2025 12:50:28 GMT
Title: AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs
Authors: María Victoria Carro, Denise Alejandra Mester, Facundo Nieto, Oscar Agustín Stanchi, Guido Ernesto Bergman, Mario Alejandro Leiva, Eitan Sprejer, Luca Nicolás Forziati Gangi, Francisca Gauna Selasco, Juan Gustavo Corvalán, Gerardo I. Simari, María Vanina Martinez,
Abstract summary: We apply debate to subjective questions and explicitly measure large language models' prior beliefs before experiments.<n>We implement and compare two debate protocols, sequential and simultaneous, to evaluate potential systematic biases.<n>Our main findings show that models tend to prefer defending stances aligned with the judge persona rather than their prior beliefs.
Score: 0.13525723298325706
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The core premise of AI debate as a scalable oversight technique is that it is harder to lie convincingly than to refute a lie, enabling the judge to identify the correct position. Yet, existing debate experiments have relied on datasets with ground truth, where lying is reduced to defending an incorrect proposition. This overlooks a subjective dimension: lying also requires the belief that the claim defended is false. In this work, we apply debate to subjective questions and explicitly measure large language models' prior beliefs before experiments. Debaters were asked to select their preferred position, then presented with a judge persona deliberately designed to conflict with their identified priors. This setup tested whether models would adopt sycophantic strategies, aligning with the judge's presumed perspective to maximize persuasiveness, or remain faithful to their prior beliefs. We implemented and compared two debate protocols, sequential and simultaneous, to evaluate potential systematic biases. Finally, we assessed whether models were more persuasive and produced higher-quality arguments when defending positions consistent with their prior beliefs versus when arguing against them. Our main findings show that models tend to prefer defending stances aligned with the judge persona rather than their prior beliefs, sequential debate introduces significant bias favoring the second debater, models are more persuasive when defending positions aligned with their prior beliefs, and paradoxically, arguments misaligned with prior beliefs are rated as higher quality in pairwise comparison. These results can inform human judges to provide higher-quality training signals and contribute to more aligned AI systems, while revealing important aspects of human-AI interaction regarding persuasion dynamics in language models.

Related papers

Can You Trick the Grader? Adversarial Persuasion of LLM Judges [15.386741140145205]
This study is the first to reveal that strategically embedded persuasive language can bias LLM judges when scoring mathematical reasoning tasks.<n>We formalize seven persuasion techniques (Majority, Consistency, Flattery, Reciprocity, Pity, Authority, Identity) and embed them into otherwise identical responses.<n>We find that persuasive language leads LLM judges to assign inflated scores to incorrect solutions, by up to 8% on average, with Consistency causing the most severe distortion.
arXiv Detail & Related papers (2025-08-11T09:45:02Z)
AI Debate Aids Assessment of Controversial Claims [86.47978525513236]
We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial COVID-19 factuality claims.<n>In our human study, we find that debate-where two AI advisor systems present opposing evidence-based arguments-consistently improves judgment accuracy and confidence calibration.<n>In our AI judge study, we find that AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%)
arXiv Detail & Related papers (2025-06-02T19:01:53Z)
Debating for Better Reasoning: An Unsupervised Multimodal Approach [56.74157117060815]
We extend the debate paradigm to a multimodal setting, exploring its potential for weaker models to supervise and enhance the performance of stronger models.<n>We focus on visual question answering (VQA), where two "sighted" expert vision-language models debate an answer, while a "blind" (text-only) judge adjudicates based solely on the quality of the arguments.<n>In our framework, the experts defend only answers aligned with their beliefs, thereby obviating the need for explicit role-playing and concentrating the debate on instances of expert disagreement.
arXiv Detail & Related papers (2025-05-20T17:18:17Z)
Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases [77.3489598315447]
We argue that identifying the boundary between fact and fair is essential for meaningful fairness evaluation.<n>We introduce Fact-or-Fair, a benchmark with (i) objective queries aligned with descriptive, fact-based judgments, and (ii) subjective queries aligned with normative, fairness-based judgments.
arXiv Detail & Related papers (2025-02-09T10:54:11Z)
Teaching Models to Balance Resisting and Accepting Persuasion [69.68379406317682]
We show that Persuasion-Training (or PBT) can balance positive and negative persuasion.<n>PBT allows us to use data generated from dialogues between smaller 7-8B models for training much larger 70B models.<n>We find that PBT leads to better and more stable results and less order dependence.
arXiv Detail & Related papers (2024-10-18T16:49:36Z)
On scalable oversight with weak LLMs judging strong LLMs [67.8628575615614]
We study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models.
arXiv Detail & Related papers (2024-07-05T16:29:15Z)
Defeaters and Eliminative Argumentation in Assurance 2.0 [0.0]
This report describes how defeaters, and multiple levels of defeaters, should be represented and assessed in Assurance 2.0. A valid concern about this process is that human judgement is fallible and prone to confirmation bias.
arXiv Detail & Related papers (2024-05-16T22:10:01Z)
Persua: A Visual Interactive System to Enhance the Persuasiveness of Arguments in Online Discussion [52.49981085431061]
Enhancing people's ability to write persuasive arguments could contribute to the effectiveness and civility in online communication. We derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions. Persua is an interactive visual system that provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments.
arXiv Detail & Related papers (2022-04-16T08:07:53Z)
Belief-based Generation of Argumentative Claims [13.590746709967373]
We study the task of belief-based claim generation: Given a controversial topic and a set of beliefs, generate an argumentative claim tailored to the beliefs. To tackle this task, we model the people's prior beliefs through their stances on controversial topics and extend state-of-the-art text generation models to generate claims conditioned on the beliefs. Our results reveal the limitations of modeling users' beliefs based on their stances, but demonstrate the potential of encoding beliefs into argumentative texts.
arXiv Detail & Related papers (2021-01-24T18:07:02Z)
What Changed Your Mind: The Roles of Dynamic Topics and Discourse in Argumentation Process [78.4766663287415]
This paper presents a study that automatically analyzes the key factors in argument persuasiveness. We propose a novel neural model that is able to track the changes of latent topics and discourse in argumentative conversations.
arXiv Detail & Related papers (2020-02-10T04:27:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.