ACC-Debate: An Actor-Critic Approach to Multi-Agent Debate
- URL: http://arxiv.org/abs/2411.00053v2
- Date: Mon, 04 Nov 2024 15:20:16 GMT
- Title: ACC-Debate: An Actor-Critic Approach to Multi-Agent Debate
- Authors: Andrew Estornell, Jean-Francois Ton, Yuanshun Yao, Yang Liu,
- Abstract summary: We propose ACC-Debate, an Actor-Critic based learning framework to produce a two-agent team specialized in debate.
We demonstrate that ACC-Debate outperforms SotA debate techniques on a wide array of benchmarks.
- Score: 20.040543142468344
- License:
- Abstract: Large language models (LLMs) have demonstrated a remarkable ability to serve as general-purpose tools for various language-based tasks. Recent works have demonstrated that the efficacy of such models can be improved through iterative dialog between multiple models, frequently referred to as multi-agent debate (MAD). While debate shows promise as a means of improving model efficacy, most works in this area treat debate as an emergent behavior, rather than a learned behavior. In doing so, current debate frameworks rely on collaborative behaviors to have been sufficiently trained into off-the-shelf models. To address this limitation, we propose ACC-Debate, an Actor-Critic based learning framework to produce a two-agent team specialized in debate. We demonstrate that ACC-Debate outperforms SotA debate techniques on a wide array of benchmarks.
Related papers
- Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks [0.0]
Chain-of-thought prompting, self-verification, and multi-agent debate are proposed to improve the reasoning and factual accuracy of large language models.
We find that multi-agent debate helps at any model scale, and that diversity of thought elicits stronger reasoning in debating LLMs.
arXiv Detail & Related papers (2024-10-10T21:59:01Z) - Training Language Models to Win Debates with Self-Play Improves Judge Accuracy [8.13173791334223]
We test the robustness of debate as a method of scalable oversight by training models to debate with data generated via self-play.
We find that language model based evaluators answer questions more accurately when judging models optimized to win debates.
arXiv Detail & Related papers (2024-09-25T05:28:33Z) - GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion [8.948702488582583]
This paper proposes a method to significantly reduce token cost in multi-agent debates.
Our method significantly enhances the performance and efficiency of interactions in the multi-agent debate.
arXiv Detail & Related papers (2024-09-21T07:49:38Z) - MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate [24.92465108034783]
Large Language Models (LLMs) have shown exceptional results on current benchmarks when working individually.
The advancement in their capabilities, along with a reduction in parameter size and inference times, has facilitated the use of these models as agents.
We evaluate the behavior of a network of models collaborating through debate under the influence of an adversary.
arXiv Detail & Related papers (2024-06-20T20:09:37Z) - Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM [51.43102092480804]
Debatrix is an automated debate judge based on Large Language Models (LLMs)
To align with real-world debate scenarios, we introduced the PanelBench benchmark, comparing our system's performance to actual debate outcomes.
The findings indicate a notable enhancement over directly using LLMs for debate evaluation.
arXiv Detail & Related papers (2024-03-12T18:19:47Z) - Combating Adversarial Attacks with Multi-Agent Debate [4.450536872346658]
We implement multi-agent debate between current state-of-the-art language models and evaluate models' susceptibility to red team attacks.
We find that multi-agent debate can reduce model toxicity when jailbroken or less capable models are forced to debate with non-jailbroken or more capable models.
arXiv Detail & Related papers (2024-01-11T15:57:38Z) - DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval.
Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP.
To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z) - SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training
with Adversarial Remarks [47.609417223514605]
This work introduces the SAIE framework, which facilitates supportive and adversarial discussions between learner and partner models.
Our empirical evaluation shows that models fine-tuned with the SAIE framework outperform those trained with conventional fine-tuning approaches.
arXiv Detail & Related papers (2023-11-14T12:12:25Z) - On the Discussion of Large Language Models: Symmetry of Agents and
Interplay with Prompts [51.3324922038486]
This paper reports the empirical results of the interplay of prompts and discussion mechanisms.
It also proposes a scalable discussion mechanism based on conquer and merge.
arXiv Detail & Related papers (2023-11-13T04:56:48Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - Multimodal Chain-of-Thought Reasoning in Language Models [94.70184390935661]
We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework.
Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach.
arXiv Detail & Related papers (2023-02-02T07:51:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.