Avoiding Obfuscation with Prover-Estimator Debate
- URL: http://arxiv.org/abs/2506.13609v1
- Date: Mon, 16 Jun 2025 15:37:33 GMT
- Title: Avoiding Obfuscation with Prover-Estimator Debate
- Authors: Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras,
- Abstract summary: We propose a protocol for AI debate that guarantees the correctness of human judgements for complex problems.<n>We show that a dishonest debater can use a computationally efficient strategy that forces an honest opponent to solve a computationally intractable problem to win.
- Score: 33.14645106993676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training powerful AI systems to exhibit desired behaviors hinges on the ability to provide accurate human supervision on increasingly complex tasks. A promising approach to this problem is to amplify human judgement by leveraging the power of two competing AIs in a debate about the correct solution to a given problem. Prior theoretical work has provided a complexity-theoretic formalization of AI debate, and posed the problem of designing protocols for AI debate that guarantee the correctness of human judgements for as complex a class of problems as possible. Recursive debates, in which debaters decompose a complex problem into simpler subproblems, hold promise for growing the class of problems that can be accurately judged in a debate. However, existing protocols for recursive debate run into the obfuscated arguments problem: a dishonest debater can use a computationally efficient strategy that forces an honest opponent to solve a computationally intractable problem to win. We mitigate this problem with a new recursive debate protocol that, under certain stability assumptions, ensures that an honest debater can win with a strategy requiring computational efficiency comparable to their opponent.
Related papers
- AI Debate Aids Assessment of Controversial Claims [86.47978525513236]
We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial COVID-19 factuality claims.<n>In our human study, we find that debate-where two AI advisor systems present opposing evidence-based arguments-consistently improves judgment accuracy and confidence calibration.<n>In our AI judge study, we find that AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%)
arXiv Detail & Related papers (2025-06-02T19:01:53Z) - An alignment safety case sketch based on debate [3.2504831918078168]
One proposed solution is to leverage another superhuman system to point out flaws in the system's outputs via a debate.<n>This paper outlines the value of debate for AI safety, as well as the assumptions and further research required to make debate work.
arXiv Detail & Related papers (2025-05-06T21:53:44Z) - Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning [8.800516398660069]
Multiagent collaboration has emerged as a promising framework for enhancing the reasoning capabilities of large language models (LLMs)<n>We propose Debate Only When Necessary (DOWN), an adaptive multiagent debate framework that selectively activates debate based on the confidence score of the agent's initial response.<n>Down improves efficiency by up to six times while preserving or even outperforming the performance of existing methods.
arXiv Detail & Related papers (2025-04-07T13:17:52Z) - PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models [59.920971312822736]
We introduce PromptCoT, a novel approach for automatically generating high-quality Olympiad-level math problems.<n>The proposed method synthesizes complex problems based on mathematical concepts and the rationale behind problem construction.<n>Our method is evaluated on standard benchmarks including GSM8K, MATH-500, and AIME2024, where it consistently outperforms existing problem generation methods.
arXiv Detail & Related papers (2025-03-04T06:32:30Z) - LLMs as Debate Partners: Utilizing Genetic Algorithms and Adversarial Search for Adaptive Arguments [0.0]
DebateBrawl is an AI-powered debate platform that integrates Large Language Models (LLMs), Genetic Algorithms (GA), and Adversarial Search (AS)<n>The system demonstrates remarkable performance in generating coherent, contextually relevant arguments while adapting its strategy in real-time.<n>The system's ability to maintain high factual accuracy (92% compared to 78% in human-only debates) addresses critical concerns in AI-assisted discourse.
arXiv Detail & Related papers (2024-12-09T06:03:48Z) - Problem Solving Through Human-AI Preference-Based Cooperation [74.39233146428492]
We propose HAICo2, a novel human-AI co-construction framework.<n>We take first steps towards a formalization of HAICo2 and discuss the difficult open research problems that it faces.
arXiv Detail & Related papers (2024-08-14T11:06:57Z) - On scalable oversight with weak LLMs judging strong LLMs [67.8628575615614]
We study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions.
We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models.
arXiv Detail & Related papers (2024-07-05T16:29:15Z) - Scalable AI Safety via Doubly-Efficient Debate [37.25328923531058]
The emergence of pre-trained AI systems with powerful capabilities has raised a critical challenge for AI safety.
The original framework was based on the assumption that honest strategy is able to simulate AI systems for an exponential number of steps.
We show how to address these challenges by designing a new set of protocols.
arXiv Detail & Related papers (2023-11-23T17:46:30Z) - The Language Labyrinth: Constructive Critique on the Terminology Used in
the AI Discourse [0.0]
This paper claims, that AI debates are still characterised by a lack of critical distance to metaphors like 'training', 'learning' or 'deciding'
As consequence, reflections regarding responsibility or potential use-cases are greatly distorted.
It is a conceptual work at the intersection of critical computer science and philosophy of language.
arXiv Detail & Related papers (2023-07-18T14:32:21Z) - Solving NLP Problems through Human-System Collaboration: A
Discussion-based Approach [98.13835740351932]
This research aims to create a dataset and computational framework for systems that discuss and refine their predictions through dialogue.
We show that the proposed system can have beneficial discussions with humans improving the accuracy by up to 25 points in the natural language inference task.
arXiv Detail & Related papers (2023-05-19T16:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.