Related papers: SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

URL: http://arxiv.org/abs/2405.20974v3
Date: Fri, 04 Oct 2024 17:23:48 GMT
Title: SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Authors: Tianyang Xu, Shujin Wu, Shizhe Diao, Xiaoze Liu, Xingyao Wang, Yangyi Chen, Jing Gao,
Abstract summary: SaySelf is a training framework that teaches large language models to express more accurate fine-grained confidence estimates. In addition, SaySelf directs LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration.
Score: 29.33581578047835
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications. Previous work elicits confidence from LLMs by direct or self-consistency prompting, or constructing specific datasets for supervised finetuning. The prompting-based approaches have inferior performance, and the training-based approaches are limited to binary or inaccurate group-level confidence estimates. In this work, we present the advanced SaySelf, a training framework that teaches LLMs to express more accurate fine-grained confidence estimates. In addition, beyond the confidence scores, SaySelf initiates the process of directing LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge and explain their uncertainty. This is achieved by using an LLM to automatically summarize the uncertainties in specific knowledge via natural language. The summarization is based on the analysis of the inconsistency in multiple sampled reasoning chains, and the resulting data is utilized for supervised fine-tuning. Moreover, we utilize reinforcement learning with a meticulously crafted reward function to calibrate the confidence estimates, motivating LLMs to deliver accurate, high-confidence predictions and to penalize overconfidence in erroneous outputs. Experimental results in both in-distribution and out-of-distribution datasets demonstrate the effectiveness of SaySelf in reducing the confidence calibration error and maintaining the task performance. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration. The code is made public at https://github.com/xu1868/SaySelf.

Related papers

Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models [34.59785123314865]
A safe and trustworthy use of Large Language Models (LLMs) requires an accurate expression of confidence in their answers. We introduce a novel Reinforcement Learning (RL) approach for LLM calibration that fine-tunes LLMs to elicit calibrated confidence estimations in their answers to factual questions.
arXiv Detail & Related papers (2025-03-04T13:48:50Z)
Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception [58.62352010928591]
Large language models (LLMs) exhibit impressive performance across diverse tasks but often struggle to accurately gauge their knowledge boundaries. This paper explores leveraging LLMs' internal states to enhance their perception of knowledge boundaries from efficiency and risk perspectives.
arXiv Detail & Related papers (2025-02-17T11:11:09Z)
Aligning Large Language Models for Faithful Integrity Against Opposing Argument [71.33552795870544]
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks. They can be easily misled by unfaithful arguments during conversations, even when their original statements are correct. We propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation.
arXiv Detail & Related papers (2025-01-02T16:38:21Z)
On Verbalized Confidence Scores for LLMs [25.160810008907397]
Uncertainty quantification for large language models (LLMs) can establish more human trust into their responses. This work focuses on asking the LLM itself to verbalize its uncertainty with a confidence score as part of its output tokens. We assess the reliability of verbalized confidence scores with respect to different datasets, models, and prompt methods.
arXiv Detail & Related papers (2024-12-19T11:10:36Z)
Fact-Level Confidence Calibration and Self-Correction [64.40105513819272]
We propose a Fact-Level framework that calibrates confidence to relevance-weighted correctness at the fact level. We also develop Confidence-Guided Fact-level Self-Correction ($textbfConFix$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones.
arXiv Detail & Related papers (2024-11-20T14:15:18Z)
Learning to Route with Confidence Tokens [43.63392143501436]
We study the extent to which large language models can reliably indicate confidence in their answers. We propose Self-REF, a lightweight training strategy to teach LLMs to express confidence in a reliable manner. Compared to conventional approaches such as verbalizing confidence and examining token probabilities, we demonstrate empirically that confidence tokens show significant improvements in downstream routing and rejection learning tasks.
arXiv Detail & Related papers (2024-10-17T07:28:18Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience [41.06726400259579]
Large Language Models (LLMs) have exhibited remarkable performance across various downstream tasks. We propose a method of Learning from Past experience (LePe) to enhance the capability for confidence expression.
arXiv Detail & Related papers (2024-04-16T06:47:49Z)
Calibrating Large Language Models Using Their Generations Only [44.26441565763495]
APRICOT is a method to set confidence targets and train an additional model that predicts an LLM's confidence based on its textual input and output alone. It is conceptually simple, does not require access to the target model beyond its output, does not interfere with the language generation, and has a multitude of potential usages. We show how our approach performs competitively in terms of calibration error for white-box and black-box LLMs on closed-book question-answering to detect incorrect LLM answers.
arXiv Detail & Related papers (2024-03-09T17:46:24Z)
Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models [84.94220787791389]
We propose Fact-and-Reflection (FaR) prompting, which improves the LLM calibration in two steps. Experiments show that FaR achieves significantly better calibration; it lowers the Expected Error by 23.5%. FaR even elicits the capability of verbally expressing concerns in less confident scenarios.
arXiv Detail & Related papers (2024-02-27T01:37:23Z)
Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models [23.42725642076256]
Large Language Models (LLMs) have catalyzed an increasing interest in their self-correction capabilities. This paper presents a comprehensive investigation into the intrinsic self-correction of LLMs. We develop an "If-or-Else" (IoE) prompting framework, designed to guide LLMs in assessing their own "confidence"
arXiv Detail & Related papers (2024-02-19T21:38:02Z)
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation [71.91287418249688]
Large language models (LLMs) often struggle with factual inaccuracies, even when they hold relevant knowledge. We leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks.
arXiv Detail & Related papers (2024-02-14T15:52:42Z)
The Calibration Gap between Model and Human Confidence in Large Language Models [14.539888672603743]
Large language models (LLMs) need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments. This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model.
arXiv Detail & Related papers (2024-01-24T22:21:04Z)
Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks. How do we evaluate the capabilities of LLMs to consistently produce factually correct answers? We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.