AI Debate Aids Assessment of Controversial Claims
- URL: http://arxiv.org/abs/2506.02175v2
- Date: Wed, 29 Oct 2025 18:37:02 GMT
- Title: AI Debate Aids Assessment of Controversial Claims
- Authors: Salman Rahman, Sheriff Issaka, Ashima Suvarna, Genglin Liu, James Shiffer, Jaeyoung Lee, Md Rizwan Parvez, Hamid Palangi, Shi Feng, Nanyun Peng, Yejin Choi, Julian Michael, Liwei Jiang, Saadia Gabriel,
- Abstract summary: We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims.<n>In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy.<n>In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%)<n>These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.
- Score: 73.8907110799657
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.
Related papers
- AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs [0.13525723298325706]
We apply debate to subjective questions and explicitly measure large language models' prior beliefs before experiments.<n>We implement and compare two debate protocols, sequential and simultaneous, to evaluate potential systematic biases.<n>Our main findings show that models tend to prefer defending stances aligned with the judge persona rather than their prior beliefs.
arXiv Detail & Related papers (2025-10-15T05:02:13Z) - Biased AI improves human decision-making but reduces trust [0.8621608193534839]
Current AI systems minimize risk by enforcing ideological neutrality, yet this may introduce automation bias by suppressing cognitive engagement in human decision-making.<n>We conducted randomized trials with 2,500 participants to test whether culturally biased AI enhances human decision-making.
arXiv Detail & Related papers (2025-08-12T19:20:43Z) - The Silicon Reasonable Person: Can AI Predict How Ordinary People Judge Reasonableness? [0.0]
This Article investigates whether large language models (LLMs) can learn to identify patterns driving human reasonableness judgments.<n>We show that certain models capture not just surface-level responses but potentially their underlying decisional architecture.<n>These findings suggest practical applications: judges could calibrate intuitions against broader patterns, lawmakers could test policy interpretations, and resource-constrained litigants could preview argument reception.
arXiv Detail & Related papers (2025-08-04T06:19:45Z) - Subjective Experience in AI Systems: What Do AI Researchers and the Public Believe? [0.42131793931438133]
We surveyed 582 AI researchers and 838 nationally representative US participants about their views on the potential development of AI systems with subjective experience.<n>When asked to estimate the chances that such systems will exist on specific dates, the median responses were 1% (AI researchers) and 5% (public) by 2024.<n>The median member of the public thought there was a higher chance that AI systems with subjective experience would never exist (25%) than the median AI researcher did (10%).
arXiv Detail & Related papers (2025-06-13T16:53:28Z) - Must Read: A Systematic Survey of Computational Persuasion [60.83151988635103]
AI-driven persuasion can be leveraged for beneficial applications, but also poses threats through manipulation and unethical influence.<n>Our survey outlines future research directions to enhance the safety, fairness, and effectiveness of AI-powered persuasion.
arXiv Detail & Related papers (2025-05-12T17:26:31Z) - From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions [8.60890432697274]
We designed a helpful AI "Peer" to help students correct fundamental physics misconceptions related to Newtonian mechanic concepts.<n>In a randomized controlled trial with 165 students, those who engaged in targeted dialogue with the AI Peer achieved post-test scores that were, on average, 10.5 percentage points higher.
arXiv Detail & Related papers (2025-04-01T04:09:13Z) - Trustworthy and Responsible AI for Human-Centric Autonomous Decision-Making Systems [2.444630714797783]
We review and discuss the intricacies of AI biases, definitions, methods of detection and mitigation, and metrics for evaluating bias.
We also discuss open challenges with regard to the trustworthiness and widespread application of AI across diverse domains of human-centric decision making.
arXiv Detail & Related papers (2024-08-28T06:04:25Z) - Rolling in the deep of cognitive and AI biases [1.556153237434314]
We argue that there is urgent need to understand AI as a sociotechnical system, inseparable from the conditions in which it is designed, developed and deployed.
We address this critical issue by following a radical new methodology under which human cognitive biases become core entities in our AI fairness overview.
We introduce a new mapping, which justifies the humans to AI biases and we detect relevant fairness intensities and inter-dependencies.
arXiv Detail & Related papers (2024-07-30T21:34:04Z) - On scalable oversight with weak LLMs judging strong LLMs [67.8628575615614]
We study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions.
We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models.
arXiv Detail & Related papers (2024-07-05T16:29:15Z) - Debate Helps Supervise Unreliable Experts [33.03555781137954]
We show that debate between two unreliable experts can help a non-expert judge more reliably identify the truth.
Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better.
These results show that debate is a promising approach for supervising increasingly capable but potentially unreliable AI systems.
arXiv Detail & Related papers (2023-11-15T05:05:40Z) - Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time.
We discuss how biased models can lead to more negative real-world outcomes for certain groups.
If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z) - Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap [56.611702960809644]
We benchmark AI's ability to imitate humans in three language tasks and three vision tasks.<n>Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges.<n>Imitation ability showed minimal correlation with conventional AI performance metrics.
arXiv Detail & Related papers (2022-11-23T16:16:52Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - Effect of Confidence and Explanation on Accuracy and Trust Calibration
in AI-Assisted Decision Making [53.62514158534574]
We study whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI.
We show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making.
arXiv Detail & Related papers (2020-01-07T15:33:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.