Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs
- URL: http://arxiv.org/abs/2412.19513v1
- Date: Fri, 27 Dec 2024 08:09:11 GMT
- Title: Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs
- Authors: Zhe Yang, Yichang Zhang, Yudong Wang, Ziyao Xu, Junyang Lin, Zhifang Sui,
- Abstract summary: Large Language Models (LLMs) can correct their self-generated responses, but a decline in accuracy after self-correction is also witnessed.
We decompose the self-correction capability into confidence (being confident to correct answers) and critique (turning wrong answers to correct) capabilities.
Our strategy outperforms vanilla SFT in both capabilities and achieves much higher accuracy after self-correction.
- Score: 34.203575667558454
- License:
- Abstract: Large Language Models (LLMs) can correct their self-generated responses, but a decline in accuracy after self-correction is also witnessed. To have a deeper understanding of self-correction, we endeavor to decompose, evaluate, and analyze the self-correction behaviors of LLMs. By enumerating and analyzing answer correctness before and after self-correction, we decompose the self-correction capability into confidence (being confident to correct answers) and critique (turning wrong answers to correct) capabilities, and propose two metrics from a probabilistic perspective to measure these 2 capabilities, along with another metric for overall self-correction capability evaluation. Based on our decomposition and evaluation metrics, we conduct extensive experiments and draw some empirical conclusions. For example, we find different models can exhibit distinct behaviors: some models are confident while others are more critical. We also find the trade-off between the two capabilities (i.e. improving one can lead to a decline in the other) when manipulating model self-correction behavior by prompts or in-context learning. Further, we find a simple yet efficient strategy to improve self-correction capability by transforming Supervision Fine-Tuning (SFT) data format, and our strategy outperforms vanilla SFT in both capabilities and achieves much higher accuracy after self-correction. Our code will be publicly available on GitHub.
Related papers
- S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference.
Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z) - Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction [5.271054803267951]
We aim to answer two fundamental questions for moral self-correction.
We examine how different self-correction components interact to intervene the embedded morality within hidden states.
We propose a validation framework, self-distinguish, that requires effective self-correction.
arXiv Detail & Related papers (2024-10-27T16:52:21Z) - Training Language Models to Self-Correct via Reinforcement Learning [98.35197671595343]
Self-correction has been found to be largely ineffective in modern large language models (LLMs)
We develop a multi-turn online reinforcement learning approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data.
We find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.
arXiv Detail & Related papers (2024-09-19T17:16:21Z) - Large Language Models have Intrinsic Self-Correction Ability [18.79203446847577]
Large language models (LLMs) have attracted significant attention for their exceptional abilities in various natural language processing tasks.
One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation.
In intrinsic self-correction is considered a promising direction because it does not utilize external knowledge.
arXiv Detail & Related papers (2024-06-21T22:29:40Z) - On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept [36.27550578296276]
Large Language Models (LLMs) are able to improve their responses when instructed to do so, a capability known as self-correction.
In intrinsic self-correction is evident in various applications, but how and why it is effective remains unknown.
We show that intrinsic self-correction can be progressively improved, allowing it to approach a converged state.
arXiv Detail & Related papers (2024-06-04T14:55:43Z) - A Theoretical Understanding of Self-Correction through In-context Alignment [51.622068973630796]
Large language models (LLMs) are capable of improving their abilities purely by self-correction.
We show that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way.
Inspired by these findings, we also illustrate applications of self-correction, such as defending against LLM jailbreaks.
arXiv Detail & Related papers (2024-05-28T22:33:02Z) - Small Language Models Need Strong Verifiers to Self-Correct Reasoning [69.94251699982388]
Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs)
This work explores whether small (= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs.
arXiv Detail & Related papers (2024-04-26T03:41:28Z) - Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models [23.42725642076256]
Large Language Models (LLMs) have catalyzed an increasing interest in their self-correction capabilities.
This paper presents a comprehensive investigation into the intrinsic self-correction of LLMs.
We develop an "If-or-Else" (IoE) prompting framework, designed to guide LLMs in assessing their own "confidence"
arXiv Detail & Related papers (2024-02-19T21:38:02Z) - Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation [71.91287418249688]
Large language models (LLMs) often struggle with factual inaccuracies, even when they hold relevant knowledge.
We leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality.
We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks.
arXiv Detail & Related papers (2024-02-14T15:52:42Z) - How Can We Know When Language Models Know? On the Calibration of
Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?"
We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated.
We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.