Related papers: Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

URL: http://arxiv.org/abs/2305.19118v4
Date: Wed, 09 Oct 2024 02:41:21 GMT
Title: Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Authors: Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, Zhaopeng Tu,
Abstract summary: We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
Score: 85.3444184685235
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern large language models (LLMs) like ChatGPT have shown remarkable performance on general language tasks but still struggle on complex reasoning tasks, which drives the research on cognitive behaviors of LLMs to explore human-like problem-solving strategies. Along this direction, one representative strategy is self-reflection, which asks an LLM to refine the solution with the feedback generated by itself iteratively. However, our study shows that such reflection-style methods suffer from the Degeneration-of-Thought (DoT) problem: once the LLM has established confidence in its solutions, it is unable to generate novel thoughts later through reflection even if its initial stance is incorrect. To address the DoT problem, we propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Clearly, our MAD framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation. Experiment results on two challenging datasets, commonsense machine translation and counter-intuitive arithmetic reasoning, demonstrate the effectiveness of our MAD framework. Extensive analyses suggest that the adaptive break of debate and the modest level of "tit for tat" state are required for MAD to obtain good performance. Moreover, we find that LLMs might not be a fair judge if different LLMs are used for agents. Code is available at https://github.com/Skytliang/Multi-Agents-Debate.

Related papers

Introspection of Thought Helps AI Agents [19.04968632268433]
Large Language Models (LLMs) and Multimodal-LLMs (MLLMs) play the most critical role and determine the initial ability and limitations of AI Agents.<n>We propose a novel AI Agent Reasoning Framework with Introspection of Thought (INoT) by designing a new LLM-Read code in prompt.<n>The effectiveness of INoT is verified, with an average improvement of 7.95% in performance, exceeding the baselines.
arXiv Detail & Related papers (2025-07-11T15:03:17Z)
CortexDebate: Debating Sparsely and Equally for Multi-Agent Debate [11.155092859033784]
Multi-Agent Debate (MAD) has emerged as an effective strategy to mitigate issues with single Large Language Model (LLM)<n>Existing MAD methods face two major issues: (a) too lengthy input contexts, which causes LLM agents to get lost in plenty of input information and experiences performance drop; and (b) the overconfidence dilemma, where self-assured LLM agents dominate the debate, leading to low debating effectiveness.<n>We propose a novel MAD method called "CortexDebate", inspired by the human brain's tendency to establish a sparse and dynamically optimized network among cortical areas governed by white matter.
arXiv Detail & Related papers (2025-07-05T07:23:15Z)
Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z)
MARCO: Meta-Reflection with Cross-Referencing for Code Reasoning [37.0077348617437]
We explore how can an LLM agent become progressively smarter in code reasoning with each solution it proposes.<n>We adopt a cognitive-evolving perspective and propose a novel framework named Meta-Reflection with Cross-Referencing (MARCO)<n>We conduct experiments across various datasets in code reasoning, and the results demonstrate the effectiveness of MARCO.
arXiv Detail & Related papers (2025-05-23T05:21:11Z)
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation [53.452699232071495]
CrossWordBench is a benchmark designed to evaluate the reasoning capabilities of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) through the medium of crossword puzzles. Our evaluation reveals that reasoning LLMs outperform non-reasoning models substantially by effectively leveraging crossing-letter constraints. Our findings offer insights into the limitations of the reasoning capabilities of current LLMs and LVLMs, and provide an effective approach for creating multimodal constrained tasks for future evaluations.
arXiv Detail & Related papers (2025-03-30T20:03:36Z)
Solving Situation Puzzles with Large Language Model and External Reformulation [6.793639595476304]
We show that large language models (LLMs) cannot perform well on reasoning that requires multiple rounds of dialogue. We propose a novel external reformulation methodology, where the situation puzzle will be reformulated after several rounds of Q&A. Experiments show superior performance (e.g., win rate, number of question/guess attempts) of our method than directly using LLMs for solving situation puzzles.
arXiv Detail & Related papers (2025-03-24T07:05:55Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [54.787341008881036]
We introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents [26.645038049346255]
We propose the Reactive and Reflection agents with Multi-Path Reasoning (RR-MP) Framework. Our approach improves scientific reasoning accuracy by employing a multi-path reasoning mechanism. We conducted zero-shot and few-shot evaluations on tasks involving moral scenarios, college-level physics, and mathematics.
arXiv Detail & Related papers (2024-12-31T13:11:20Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs [45.38821594541265]
Large Language Models (LLMs) excel in various natural language processing tasks but struggle with hallucination issues. We propose a CounterFactual Multi-Agent Debate (CFMAD) framework to override LLMs' inherent biases for answer inspection.
arXiv Detail & Related papers (2024-06-17T13:21:23Z)
CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving [9.446546965008249]
We propose a collaborative multi-agent, multi-reasoning-path (CoMM) prompting framework. Specifically, we prompt LLMs to play different roles in a problem-solving team, and encourage different role-play agents to collaboratively solve the target task. Empirical results demonstrate the effectiveness of the proposed methods on two college-level science problems.
arXiv Detail & Related papers (2024-04-26T23:29:12Z)
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [68.83624133567213]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question. We also propose a simple yet effective method, Active Deduction (AD), to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z)
How Far Are We from Intelligent Visual Deductive Reasoning? [41.4377002379162]
We dig into vision-based deductive reasoning, a more sophisticated but less explored realm. We find previously unexposed blindspots in the current SOTA VLMs. We find that certain standard strategies that are effective when applied to LLMs do not seamlessly translate to the challenges presented by visual reasoning tasks.
arXiv Detail & Related papers (2024-03-07T18:35:54Z)
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key? [84.36332588191623]
We propose a novel group discussion framework to enrich the set of discussion mechanisms. We observe that the multi-agent discussion performs better than a single agent only when there is no demonstration in the prompt.
arXiv Detail & Related papers (2024-02-28T12:04:05Z)
Adversarial Math Word Problem Generation [6.92510069380188]
We propose a new paradigm for ensuring fair evaluation of large language models (LLMs) We generate adversarial examples which preserve the structure and difficulty of the original questions aimed for assessment, but are unsolvable by LLMs. We conduct experiments on various open- and closed-source LLMs, quantitatively and qualitatively demonstrating that our method significantly degrades their math problem-solving ability.
arXiv Detail & Related papers (2024-02-27T22:07:52Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles [22.119796373133298]
We propose a novel evaluation benchmark, LatEval, which assesses the model's lateral thinking within an interactive framework. In our benchmark, we challenge LLMs with 2 aspects: the quality of questions posed by the model and the model's capability to integrate information for problem-solving. For example, even the most advanced model, GPT-4, exhibits the advantage to some extent, yet still maintain a noticeable gap when compared to human.
arXiv Detail & Related papers (2023-08-21T16:49:40Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.