MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
- URL: http://arxiv.org/abs/2409.12147v1
- Date: Wed, 18 Sep 2024 17:12:41 GMT
- Title: MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
- Authors: Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal,
- Abstract summary: Large Language Models' (LLM) reasoning can be improved using test-time aggregation strategies, i.e., generating multiple samples and voting among generated samples.
Refinement offers an alternative by using LLM-generated feedback to improve solution quality.
We propose MAgICoRe, which avoids excessive refinement by categorizing problem difficulty as easy or hard.
- Score: 60.55556283848063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models' (LLM) reasoning can be improved using test-time aggregation strategies, i.e., generating multiple samples and voting among generated samples. While these improve performance, they often reach a saturation point. Refinement offers an alternative by using LLM-generated feedback to improve solution quality. However, refinement introduces 3 key challenges: (1) Excessive refinement: Uniformly refining all instances can over-correct and reduce the overall performance. (2) Inability to localize and address errors: LLMs have a limited ability to self-correct and struggle to identify and correct their own mistakes. (3) Insufficient refinement: Deciding how many iterations of refinement are needed is non-trivial, and stopping too soon could leave errors unaddressed. To tackle these issues, we propose MAgICoRe, which avoids excessive refinement by categorizing problem difficulty as easy or hard, solving easy problems with coarse-grained aggregation and hard ones with fine-grained and iterative multi-agent refinement. To improve error localization, we incorporate external step-wise reward model (RM) scores. Moreover, to ensure effective refinement, we employ a multi-agent loop with three agents: Solver, Reviewer (which generates targeted feedback based on step-wise RM scores), and the Refiner (which incorporates feedback). To ensure sufficient refinement, we re-evaluate updated solutions, iteratively initiating further rounds of refinement. We evaluate MAgICoRe on Llama-3-8B and GPT-3.5 and show its effectiveness across 5 math datasets. Even one iteration of MAgICoRe beats Self-Consistency by 3.4%, Best-of-k by 3.2%, and Self-Refine by 4.0% while using less than half the samples. Unlike iterative refinement with baselines, MAgICoRe continues to improve with more iterations. Finally, our ablations highlight the importance of MAgICoRe's RMs and multi-agent communication.
Related papers
- Vision-Language Models Can Self-Improve Reasoning via Reflection [20.196406628954303]
Chain-of-thought (CoT) has proven to improve the reasoning capability of large language models (LLMs)
We propose a self-training framework, R3V, which iteratively enhances the model's Vision-language Reasoning by Reflecting on CoT Rationales.
Our approach supports self-reflection on generated solutions, further boosting performance through test-time computation.
arXiv Detail & Related papers (2024-10-30T14:45:00Z) - Learning to Refine with Fine-Grained Natural Language Feedback [81.70313509881315]
We propose looking at refinement with feedback as a composition of three distinct LLM competencies.
A key property of the proposed Detect, Critique, Refine ("DCR") method is that the step 2 critique model can give fine-grained feedback about errors.
We show that models of different capabilities benefit from refining with DCR on the task of improving factual consistency of document grounded summaries.
arXiv Detail & Related papers (2024-07-02T16:15:01Z) - V-STaR: Training Verifiers for Self-Taught Reasoners [71.53113558733227]
V-STaR trains a verifier using DPO that judges correctness of model-generated solutions.
Running V-STaR for multiple iterations results in progressively better reasoners and verifiers.
arXiv Detail & Related papers (2024-02-09T15:02:56Z) - Enhancing Large Language Model Performance To Answer Questions and
Extract Information More Accurately [2.1715455600756646]
Large Language Models (LLMs) generate responses to questions.
Their effectiveness is often hindered by sub-optimal quality of answers and occasional failures to provide accurate responses to questions.
To address these challenges, a fine-tuning process is employed, involving feedback and examples to refine models.
arXiv Detail & Related papers (2024-01-27T00:18:07Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - Improving Large Language Model Fine-tuning for Solving Math Problems [20.417053742869403]
A large gap exists between large language models' pass-at-one and pass-at-N performance in solving math problems.
Using the challenging MATH dataset, we investigate three fine-tuning strategies.
We design a fine-tuning recipe that yields approximately 58.8% accuracy on the MATH dataset with fine-tuned PaLM 2-L models.
arXiv Detail & Related papers (2023-10-16T04:11:19Z) - Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency [127.97467912117652]
Large language models (LLMs) have exhibited remarkable ability in code generation.
However, generating the correct solution in a single attempt still remains a challenge.
We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency.
arXiv Detail & Related papers (2023-09-29T14:23:26Z) - RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by
Reversing Chain-of-Thought [56.558892336235914]
Reversing Chain-of-Thought (RCoT) is a novel method to improve large language models' reasoning abilities.
RCoT automatically detects and rectifys factual inconsistency in generated solutions.
We show that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities.
arXiv Detail & Related papers (2023-05-19T08:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.