Related papers: Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration

Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration

URL: http://arxiv.org/abs/2310.00280v3
Date: Wed, 21 Aug 2024 05:11:10 GMT
Title: Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Authors: Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, Lingpeng Kong,
Abstract summary: Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents. Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes. We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
Score: 83.4031923134958
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are evolving at an unprecedented pace and have exhibited considerable capability in the realm of natural language processing (NLP) with world knowledge. Benefiting from ultra-large-scale training corpora, a single LLM can manage typical NLP tasks competently. However, its performance in executing reasoning tasks is still confined by the limitations of its internal representations. To push this boundary further, we introduce Corex in this paper, a suite of novel general-purpose strategies that transform LLMs into autonomous agents pioneering multi-model collaborations for complex task-solving. Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes, which collectively work towards enhancing the factuality, faithfulness, and reliability of the reasoning process. These paradigms foster task-agnostic approaches that enable LLMs to ''think outside the box,'' thereby overcoming hallucinations and providing better solutions. Through extensive experiments across four different types of reasoning tasks, we demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods. Further results and in-depth analysis demonstrate the cost-effectiveness of our method, facilitating collaboration among different LLMs and promoting annotation efficiency.

Related papers

Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration [63.90193684394165]
We introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation.<n>During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards.<n>During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step.
arXiv Detail & Related papers (2025-05-29T07:24:37Z)
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation [53.452699232071495]
CrossWordBench is a benchmark designed to evaluate the reasoning capabilities of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) through the medium of crossword puzzles. Our evaluation reveals that reasoning LLMs outperform non-reasoning models substantially by effectively leveraging crossing-letter constraints. Our findings offer insights into the limitations of the reasoning capabilities of current LLMs and LVLMs, and provide an effective approach for creating multimodal constrained tasks for future evaluations.
arXiv Detail & Related papers (2025-03-30T20:03:36Z)
MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning [26.736078756799635]
We introduce a new post-training paradigm MAPoRL (Multi-Agent Post-co-training for collaborative LLMs with Reinforcement Learning) In MAPoRL, multiple LLMs first generate their own responses independently and engage in a multi-turn discussion to collaboratively improve the final answer. A MAPoRL verifier evaluates both the answer and the discussion, by assigning a score that verifies the correctness of the answer. The score serves as the co-training reward, and is then maximized through multi-agent RL.
arXiv Detail & Related papers (2025-02-25T18:33:48Z)
When One LLM Drools, Multi-LLM Collaboration Rules [98.71562711695991]
We argue for multi-LLM collaboration to better represent the extensive diversity of data, skills, and people. We organize existing multi-LLM collaboration methods into a hierarchy, based on the level of access and information exchange. We envision multi-LLM collaboration as an essential path toward compositional intelligence and collaborative AI development.
arXiv Detail & Related papers (2025-02-06T21:13:44Z)
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains. Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities. We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z)
MTMT: Consolidating Multiple Thinking Modes to Form a Thought Tree for Strengthening LLM [15.687878949848182]
Large language models (LLMs) have shown limitations in tasks requiring complex logical reasoning and multi-step problem-solving.<n>We introduce MTMT (Multi-thinking Modes Tree), a novel method that interacts with LLMs to construct a thought tree.<n>We evaluate the performance of MTMT under different parameter configurations, using GPT-4o mini as the base model.
arXiv Detail & Related papers (2024-12-05T09:05:30Z)
BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts [59.83547898874152]
We introduce BloomWise, a new prompting technique, inspired by Bloom's taxonomy, to improve the performance of Large Language Models (LLMs) The decision regarding the need to employ more sophisticated cognitive skills is based on self-evaluation performed by the LLM. In extensive experiments across 4 popular math reasoning datasets, we have demonstrated the effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-10-05T09:27:52Z)
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making. Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations. Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z)
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models [32.336273322481276]
Despite their diverse capabilities, Large Language Models (LLMs) exhibit varying strengths and weaknesses. To address these challenges, recent studies have explored collaborative strategies for LLMs. This paper provides a comprehensive overview of this emerging research area, highlighting the motivation behind such collaborations.
arXiv Detail & Related papers (2024-07-08T16:29:08Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [79.38140606606126]
We propose an algorithmic framework that fine-tunes vision-language models (VLMs) with reinforcement learning (RL) Our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning. We demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks.
arXiv Detail & Related papers (2024-05-16T17:50:19Z)
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning [14.635361844362794]
Smurfs' is a cutting-edge multi-agent framework designed to revolutionize the application of large language models. Smurfs can enhance the model's ability to solve complex tasks at no additional cost.
arXiv Detail & Related papers (2024-05-09T17:49:04Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
CoF-CoT: Enhancing Large Language Models with Coarse-to-Fine Chain-of-Thought Prompting for Multi-domain NLU Tasks [46.862929778121675]
Chain-of-Thought prompting is popular in reasoning tasks, but its application to Natural Language Understanding (NLU) is under-explored. Motivated by multi-step reasoning of Large Language Models (LLMs), we propose Coarse-to-Fine Chain-of-Thought (CoF-CoT) approach.
arXiv Detail & Related papers (2023-10-23T06:54:51Z)
Theory of Mind for Multi-Agent Collaboration via Large Language Models [5.2767999863286645]
This study evaluates Large Language Models (LLMs)-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks. We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents.
arXiv Detail & Related papers (2023-10-16T07:51:19Z)
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate [41.949869545423375]
Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues. To examine whether LLMs can collaborate effectively to achieve a consensus for a shared goal, we focus on commonsense reasoning. Our work contributes to understanding the inter-consistency among LLMs and lays the foundation for developing future collaboration methods.
arXiv Detail & Related papers (2023-05-19T11:15:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.