Examining Inter-Consistency of Large Language Models Collaboration: An
In-depth Analysis via Debate
- URL: http://arxiv.org/abs/2305.11595v3
- Date: Wed, 18 Oct 2023 06:32:15 GMT
- Title: Examining Inter-Consistency of Large Language Models Collaboration: An
In-depth Analysis via Debate
- Authors: Kai Xiong, Xiao Ding, Yixin Cao, Ting Liu and Bing Qin
- Abstract summary: Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues.
To examine whether LLMs can collaborate effectively to achieve a consensus for a shared goal, we focus on commonsense reasoning.
Our work contributes to understanding the inter-consistency among LLMs and lays the foundation for developing future collaboration methods.
- Score: 41.949869545423375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have shown impressive capabilities in various
applications, but they still face various inconsistency issues. Existing works
primarily focus on the inconsistency issues within a single LLM, while we
complementarily explore the inter-consistency among multiple LLMs for
collaboration. To examine whether LLMs can collaborate effectively to achieve a
consensus for a shared goal, we focus on commonsense reasoning, and introduce a
formal debate framework (FORD) to conduct a three-stage debate among LLMs with
real-world scenarios alignment: fair debate, mismatched debate, and roundtable
debate. Through extensive experiments on various datasets, LLMs can effectively
collaborate to reach a consensus despite noticeable inter-inconsistencies, but
imbalances in their abilities can lead to domination by superior LLMs.
Leveraging a more advanced LLM like GPT-4 as an authoritative judge can boost
collaboration performance. Our work contributes to understanding the
inter-consistency among LLMs and lays the foundation for developing future
collaboration methods. Codes and data are available at
https://github.com/Waste-Wood/FORD
Related papers
- CIBench: Evaluating Your LLMs with a Code Interpreter Plugin [68.95137938214862]
We propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks.
The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions.
We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.
arXiv Detail & Related papers (2024-07-15T07:43:55Z) - Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models [32.336273322481276]
Despite their diverse capabilities, Large Language Models (LLMs) exhibit varying strengths and weaknesses.
To address these challenges, recent studies have explored collaborative strategies for LLMs.
This paper provides a comprehensive overview of this emerging research area, highlighting the motivation behind such collaborations.
arXiv Detail & Related papers (2024-07-08T16:29:08Z) - LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play [43.55248812883912]
Large language models (LLMs) have shown exceptional proficiency in natural language processing but often fall short of generating creative and original responses to open-ended questions.
We propose LLM Discussion, a three-phase discussion framework that facilitates vigorous and diverging idea exchanges.
We evaluate the efficacy of the proposed framework with the Alternative Uses Test, Similarities Test, Instances Test, and Scientific Creativity Test.
arXiv Detail & Related papers (2024-05-10T10:19:14Z) - Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the
Key? [84.36332588191623]
We propose a novel group discussion framework to enrich the set of discussion mechanisms.
We observe that the multi-agent discussion performs better than a single agent only when there is no demonstration in the prompt.
arXiv Detail & Related papers (2024-02-28T12:04:05Z) - Theory of Mind for Multi-Agent Collaboration via Large Language Models [5.2767999863286645]
This study evaluates Large Language Models (LLMs)-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks.
We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents.
arXiv Detail & Related papers (2023-10-16T07:51:19Z) - LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models [23.092480882456048]
This study aims at a detailed analysis of Large Language Models (LLMs) within the context of Pure Coordination Games.
Our findings indicate that LLM agents equipped with GPT-4-turbo achieve comparable performance to state-of-the-art reinforcement learning methods.
Results on Coordination QA show a large room for improvement in the Theory of Mind reasoning and joint planning abilities of LLMs.
arXiv Detail & Related papers (2023-10-05T21:18:15Z) - Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View [60.80731090755224]
This paper probes the collaboration mechanisms among contemporary NLP systems by practical experiments with theoretical insights.
We fabricate four unique societies' comprised of LLM agents, where each agent is characterized by a specific trait' (easy-going or overconfident) and engages in collaboration with a distinct thinking pattern' (debate or reflection)
Our results further illustrate that LLM agents manifest human-like social behaviors, such as conformity and consensus reaching, mirroring social psychology theories.
arXiv Detail & Related papers (2023-10-03T15:05:52Z) - Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration [83.4031923134958]
Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents.
Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes.
We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
arXiv Detail & Related papers (2023-09-30T07:11:39Z) - Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.
Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.