The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
- URL: http://arxiv.org/abs/2503.16024v1
- Date: Thu, 20 Mar 2025 10:42:33 GMT
- Title: The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
- Authors: Ruihan Yang, Fanghua Ye, Jian Li, Siyu Yuan, Yikai Zhang, Zhaopeng Tu, Xiaolong Li, Deqing Yang,
- Abstract summary: Large language models (LLMs) have recently transformed from text-based assistants to autonomous agents capable of planning, reasoning, and iteratively improving their actions.<n>In this work, we introduce Critique-Guided Improvement (CGI), a novel two-player framework, comprising an actor model that explores an environment and a critic model that generates detailed nature language feedback.
- Score: 49.687224320842105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have recently transformed from text-based assistants to autonomous agents capable of planning, reasoning, and iteratively improving their actions. While numerical reward signals and verifiers can effectively rank candidate actions, they often provide limited contextual guidance. In contrast, natural language feedback better aligns with the generative capabilities of LLMs, providing richer and more actionable suggestions. However, parsing and implementing this feedback effectively can be challenging for LLM-based agents. In this work, we introduce Critique-Guided Improvement (CGI), a novel two-player framework, comprising an actor model that explores an environment and a critic model that generates detailed nature language feedback. By training the critic to produce fine-grained assessments and actionable revisions, and the actor to utilize these critiques, our approach promotes more robust exploration of alternative strategies while avoiding local optima. Experiments in three interactive environments show that CGI outperforms existing baselines by a substantial margin. Notably, even a small critic model surpasses GPT-4 in feedback quality. The resulting actor achieves state-of-the-art performance, demonstrating the power of explicit iterative guidance to enhance decision-making in LLM-based agents.
Related papers
- DeepCritic: Deliberate Critique with Large Language Models [77.5516314477878]
We focus on studying and enhancing the math critique ability of Large Language Models (LLMs)
Our developed critique model built on Qwen2.5-7B-Instruct significantly outperforms existing LLM critics on various error identification benchmarks.
arXiv Detail & Related papers (2025-05-01T17:03:17Z) - RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques [59.861013614500024]
We introduce a new benchmark designed to assess the critique capabilities of Large Language Models (LLMs)<n>Unlike existing benchmarks, which typically function in an open-loop fashion, our approach employs a closed-loop methodology that evaluates the quality of corrections generated from critiques.
arXiv Detail & Related papers (2025-01-24T13:48:10Z) - Enabling Scalable Oversight via Self-Evolving Critic [59.861013614500024]
SCRIT (Self-evolving CRITic) is a framework that enables genuine self-evolution of critique abilities.
It self-improves by training on synthetic data, generated by a contrastive-based self-critic.
It achieves up to a 10.3% improvement on critique-correction and error identification benchmarks.
arXiv Detail & Related papers (2025-01-10T05:51:52Z) - Training Agents with Weakly Supervised Feedback from Large Language Models [19.216542820742607]
This paper introduces a novel training method for LLM-based agents using weakly supervised signals from a critic LLM.<n>Our agents are trained in iterative manner, where they initially generate trajectories through environmental interaction.<n>Tests on the API-bank dataset show consistent improvement in our agents' capabilities and comparable performance to GPT-4.
arXiv Detail & Related papers (2024-11-29T08:47:04Z) - Training Language Models to Critique With Multi-agent Feedback [102.42751835338233]
MultiCritique pipeline improves critique ability of LLMs by utilizing multi-agent feedback.
pipeline aggregates high-quality critiques from multiple agents instead of a single model.
Our fine-tuned 7B model significantly surpasses other advanced 7B-13B open-source models.
arXiv Detail & Related papers (2024-10-20T04:57:45Z) - Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic [48.94340387130627]
Critic-CoT is a framework that pushes LLMs toward System-2-like critic capability.
CoT reasoning paradigm and the automatic construction of distant-supervision data without human annotation.
Experiments on GSM8K and MATH demonstrate that our enhanced model significantly boosts task-solving performance.
arXiv Detail & Related papers (2024-08-29T08:02:09Z) - StreamBench: Towards Benchmarking Continuous Improvement of Language Agents [63.54557575233165]
Large language model (LLM) agents are able to improve themselves from experience, which is an important ability for continuous enhancement post-deployment.
We introduce StreamBench, a benchmark designed to evaluate the continuous improvement of LLM agents over an input-feedback sequence.
Our work serves as a stepping stone towards developing effective online learning strategies for LLMs, paving the way for more adaptive AI systems in streaming scenarios.
arXiv Detail & Related papers (2024-06-13T02:08:28Z) - CriticBench: Benchmarking LLMs for Critique-Correct Reasoning [26.45110574463893]
CriticBench is a benchmark designed to assess Large Language Models' abilities to critique and rectify their reasoning.
We evaluate and dissect the performance of 17 LLMs in generation, critique, and correction reasoning.
arXiv Detail & Related papers (2024-02-22T18:59:02Z) - Enhancing Robotic Manipulation with AI Feedback from Multimodal Large
Language Models [41.38520841504846]
Large language models (LLMs) can provide automated preference feedback solely from image inputs to guide decision-making.
In this study, we train a multimodal LLM, termed CriticGPT, capable of understanding trajectory videos in robot manipulation tasks.
Experimental evaluation of the algorithm's preference accuracy demonstrates its effective generalization ability to new tasks.
Performance on Meta-World tasks reveals that CriticGPT's reward model efficiently guides policy learning, surpassing rewards based on state-of-the-art pre-trained representation models.
arXiv Detail & Related papers (2024-02-22T03:14:03Z) - Automatically Correcting Large Language Models: Surveying the landscape
of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output.
This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.