Related papers: Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters

Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters

URL: http://arxiv.org/abs/2505.14886v1
Date: Tue, 20 May 2025 20:17:51 GMT
Title: Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters
Authors: Danqing Wang, Zhuorui Ye, Xinran Zhao, Fei Fang, Lei Li,
Abstract summary: We propose TreeDebater, a novel debate framework that excels in competitive debate.<n>We show that TreeDebater shows better strategies in limiting time to important debate actions, aligning with the strategies of human debate experts.
Score: 41.63762714104634
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Winning competitive debates requires sophisticated reasoning and argument skills. There are unique challenges in the competitive debate: (1) The time constraints force debaters to make strategic choices about which points to pursue rather than covering all possible arguments; (2) The persuasiveness of the debate relies on the back-and-forth interaction between arguments, which a single final game status cannot evaluate. To address these challenges, we propose TreeDebater, a novel debate framework that excels in competitive debate. We introduce two tree structures: the Rehearsal Tree and Debate Flow Tree. The Rehearsal Tree anticipates the attack and defenses to evaluate the strength of the claim, while the Debate Flow Tree tracks the debate status to identify the active actions. TreeDebater allocates its time budget among candidate actions and uses the speech time controller and feedback from the simulated audience to revise its statement. The human evaluation on both the stage-level and the debate-level comparison shows that our TreeDebater outperforms the state-of-the-art multi-agent debate system. Further investigation shows that TreeDebater shows better strategies in limiting time to important debate actions, aligning with the strategies of human debate experts.

Related papers

Avoiding Obfuscation with Prover-Estimator Debate [33.14645106993676]
We propose a protocol for AI debate that guarantees the correctness of human judgements for complex problems.<n>We show that a dishonest debater can use a computationally efficient strategy that forces an honest opponent to solve a computationally intractable problem to win.
arXiv Detail & Related papers (2025-06-16T15:37:33Z)
Debating for Better Reasoning: An Unsupervised Multimodal Approach [56.74157117060815]
We extend the debate paradigm to a multimodal setting, exploring its potential for weaker models to supervise and enhance the performance of stronger models.<n>We focus on visual question answering (VQA), where two "sighted" expert vision-language models debate an answer, while a "blind" (text-only) judge adjudicates based solely on the quality of the arguments.<n>In our framework, the experts defend only answers aligned with their beliefs, thereby obviating the need for explicit role-playing and concentrating the debate on instances of expert disagreement.
arXiv Detail & Related papers (2025-05-20T17:18:17Z)
DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models [1.8197265299982013]
We introduce DebateBench, a novel dataset consisting of an extensive collection of transcripts and metadata from some of the world's most prestigious competitive debates.<n>The dataset consists of British Parliamentary debates from prestigious debating tournaments on diverse topics, annotated with detailed speech-level scores and house rankings sourced from official adjudication data.<n>We curate 256 speeches across 32 debates with each debate being over 1 hour long with each input being an average of 32,000 tokens.
arXiv Detail & Related papers (2025-02-10T09:23:03Z)
Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate [22.813887723656023]
Agent for Debate (Agent4Debate) is a dynamic multi-agent framework based on Large Language Models (LLMs) The evaluation employs the Debatrix automatic scoring system and professional human reviewers based on the established Debatrix-Elo and Human-Elo ranking. Experimental results indicate that the state-of-the-art Agent4Debate exhibits capabilities comparable to those of humans.
arXiv Detail & Related papers (2024-08-08T14:02:45Z)
A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning [53.35861580821777]
The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. To address the issue, we propose a deductive (top-down) debating approach called Blueprint Debate on Graphs (BDoG) In BDoG, debates are confined to a blueprint graph to prevent opinion trivialization through world-level summarization. Moreover, by storing evidence in branches within the graph, BDoG mitigates distractions caused by frequent but irrelevant concepts.
arXiv Detail & Related papers (2024-03-22T06:03:07Z)
Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM [51.43102092480804]
Debatrix is an automated debate judge based on Large Language Models (LLMs) To align with real-world debate scenarios, we introduced the PanelBench benchmark, comparing our system's performance to actual debate outcomes. The findings indicate a notable enhancement over directly using LLMs for debate evaluation.
arXiv Detail & Related papers (2024-03-12T18:19:47Z)
Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation [62.069374456021016]
We present the ArgTersely benchmark for sentence-level counter-argument generation. We also propose Arg-LlaMA for generating high-quality counter-argument.
arXiv Detail & Related papers (2023-12-21T06:51:34Z)
DEBACER: a method for slicing moderated debates [55.705662163385966]
Partitioning debates into blocks with the same subject is essential for understanding. We propose a new algorithm, DEBACER, which partitions moderated debates.
arXiv Detail & Related papers (2021-12-10T10:39:07Z)
High Quality Real-Time Structured Debate Generation [0.0]
We define debate trees and paths for generating debates while enforcing a high level structure and grammar. We leverage a large corpus of tree-structured debates that have metadata associated with each argument. Our results demonstrate the ability to generate debates in real-time on complex topics at a quality that is close to humans.
arXiv Detail & Related papers (2020-12-01T01:39:38Z)
DebateSum: A large-scale argument mining and summarization dataset [0.0]
DebateSum consists of 187,386 unique pieces of evidence with corresponding argument and extractive summaries. We train several transformer summarization models to benchmark summarization performance on DebateSum. We present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association.
arXiv Detail & Related papers (2020-11-14T10:06:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.