Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
- URL: http://arxiv.org/abs/2411.16579v1
- Date: Mon, 25 Nov 2024 17:11:54 GMT
- Title: Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
- Authors: Zhiheng Xi, Dingwen Yang, Jixuan Huang, Jiafu Tang, Guanyu Li, Yiwen Ding, Wei He, Boyang Hong, Shihan Do, Wenyu Zhan, Xiao Wang, Rui Zheng, Tao Ji, Xiaowei Shi, Yitao Zhai, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Zuxuan Wu, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Yu-Gang Jiang,
- Abstract summary: We propose a two-player paradigm that separates the roles of reasoning and critique models.
We first propose AutoMathCritique, an automated and scalable framework for collecting critique data.
We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
- Score: 120.40788744292739
- License:
- Abstract: Training large language models (LLMs) to spend more time thinking and reflection before responding is crucial for effectively solving complex reasoning tasks in fields such as science, coding, and mathematics. However, the effectiveness of mechanisms like self-reflection and self-correction depends on the model's capacity to accurately assess its own performance, which can be limited by factors such as initial accuracy, question difficulty, and the lack of external feedback. In this paper, we delve into a two-player paradigm that separates the roles of reasoning and critique models, where the critique model provides step-level feedback to supervise the reasoning (actor) model during both test-time and train-time. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data, resulting in a dataset of $76,321$ responses paired with step-level feedback. Fine-tuning language models with this dataset enables them to generate natural language feedback for mathematical reasoning. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time, especially when scaling up inference-time computation. Motivated by these findings, we introduce the critique-based supervision to the actor's self-training process, and propose a critique-in-the-loop self-improvement method. Experiments show that the method improves the actor's exploration efficiency and solution diversity, especially on challenging queries, leading to a stronger reasoning model. Lastly, we take the preliminary step to explore training self-talk reasoning models via critique supervision and showcase its potential. Our code and datasets are at \href{https://mathcritique.github.io/}{https://mathcritique.github.io/}.
Related papers
- SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation [14.786100203787194]
Large language models demonstrate exceptional performance in simple code generation tasks but face challenges in tackling complex problems.
We propose a reasoning-augmented data generation process, SRA-MCTS, which guides the model to autonomously generate high-quality intermediate reasoning paths.
Our method operates entirely through the model itself without requiring additional supervision.
arXiv Detail & Related papers (2024-11-17T12:31:04Z) - CriticAL: Critic Automation with Language Models [31.1575961776287]
CriticAL generates summary statistics that capture discrepancies between model predictions and data.
CriticAL reliably generates correct critiques without hallucinating incorrect ones.
arXiv Detail & Related papers (2024-11-10T20:41:35Z) - Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks [68.49251303172674]
State-of-the-art large language models (LLMs) exhibit impressive problem-solving capabilities but may struggle with complex reasoning and factual correctness.
Existing methods harness the strengths of chain-of-thought and retrieval-augmented generation (RAG) to decompose a complex problem into simpler steps and apply retrieval to improve factual correctness.
We introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning.
arXiv Detail & Related papers (2024-10-02T11:26:02Z) - Recursive Introspection: Teaching Language Model Agents How to Self-Improve [30.086494067593268]
We develop RISE: Recursive IntroSpEction, an approach for fine-tuning large language models.
Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on math reasoning tasks.
arXiv Detail & Related papers (2024-07-25T17:35:59Z) - Small Language Models Need Strong Verifiers to Self-Correct Reasoning [69.94251699982388]
Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs)
This work explores whether small (= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs.
arXiv Detail & Related papers (2024-04-26T03:41:28Z) - Improving Language Model Reasoning with Self-motivated Learning [60.779625789039486]
textitSelf-motivated Learning framework motivates the model itself to automatically generate rationales on existing datasets.
We train a reward model with the rank to evaluate the quality of rationales, and improve the performance of reasoning through reinforcement learning.
arXiv Detail & Related papers (2024-04-10T14:05:44Z) - Critique Ability of Large Language Models [38.34144195927209]
This study explores the ability of large language models (LLMs) to deliver accurate critiques across various tasks.
We develop a benchmark called CriticBench, which comprises 3K high-quality natural language queries and corresponding model responses.
arXiv Detail & Related papers (2023-10-07T14:12:15Z) - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive
Critiquing [139.77117915309023]
CRITIC allows large language models to validate and amend their own outputs in a manner similar to human interaction with tools.
Comprehensive evaluations involving free-form question answering, mathematical program synthesis, and toxicity reduction demonstrate that CRITIC consistently enhances the performance of LLMs.
arXiv Detail & Related papers (2023-05-19T15:19:44Z) - Self-critiquing models for assisting human evaluators [11.1006983438712]
We fine-tune large language models to write natural language critiques (natural language critical comments) using behavioral cloning.
On a topic-based summarization task, critiques written by our models help humans find flaws in summaries that they would have otherwise missed.
Larger models write more helpful critiques, and on most tasks, are better at self-critiquing, despite having harder-to-critique outputs.
arXiv Detail & Related papers (2022-06-12T17:40:53Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.