Related papers: Learning to Route with Confidence Tokens

Learning to Route with Confidence Tokens

URL: http://arxiv.org/abs/2410.13284v1
Date: Thu, 17 Oct 2024 07:28:18 GMT
Title: Learning to Route with Confidence Tokens
Authors: Yu-Neng Chuang, Helen Zhou, Prathusha Kameswara Sarma, Parikshit Gopalan, John Boccio, Sara Bolouki, Xia Hu,
Abstract summary: We study the extent to which large language models can reliably indicate confidence in their answers. We propose Self-REF, a lightweight training strategy to teach LLMs to express confidence in a reliable manner. Compared to conventional approaches such as verbalizing confidence and examining token probabilities, we demonstrate empirically that confidence tokens show significant improvements in downstream routing and rejection learning tasks.
Score: 43.63392143501436
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have demonstrated impressive performance on several tasks and are increasingly deployed in real-world applications. However, especially in high-stakes settings, it becomes vital to know when the output of an LLM may be unreliable. Depending on whether an answer is trustworthy, a system can then choose to route the question to another expert, or otherwise fall back on a safe default behavior. In this work, we study the extent to which LLMs can reliably indicate confidence in their answers, and how this notion of confidence can translate into downstream accuracy gains. We propose Self-REF, a lightweight training strategy to teach LLMs to express confidence in whether their answers are correct in a reliable manner. Self-REF introduces confidence tokens into the LLM, from which a confidence score can be extracted. Compared to conventional approaches such as verbalizing confidence and examining token probabilities, we demonstrate empirically that confidence tokens show significant improvements in downstream routing and rejection learning tasks.

Related papers

Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs [2.626100048563503]
Large language models (LLMs) can increase users' perceived trust by verbalizing confidence in their outputs.<n>We introduce TracVC, a method that builds on information retrieval and influence estimation to trace generated confidence expressions back to the training data.<n>Our analysis reveals that OLMo2-13B is frequently influenced by confidence-related data that is lexically unrelated to the query.
arXiv Detail & Related papers (2026-01-15T18:05:42Z)
ConfTuner: Training Large Language Models to Express Their Confidence Verbally [58.63318088243125]
Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as science, law, and healthcare.<n>LLMs are often observed to generate incorrect answers with high confidence, a phenomenon known as "overconfidence"
arXiv Detail & Related papers (2025-08-26T09:25:32Z)
Aligning Large Language Models for Faithful Integrity Against Opposing Argument [71.33552795870544]
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks. They can be easily misled by unfaithful arguments during conversations, even when their original statements are correct. We propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation.
arXiv Detail & Related papers (2025-01-02T16:38:21Z)
On Verbalized Confidence Scores for LLMs [25.160810008907397]
Uncertainty quantification for large language models (LLMs) can establish more human trust into their responses. This work focuses on asking the LLM itself to verbalize its uncertainty with a confidence score as part of its output tokens. We assess the reliability of verbalized confidence scores with respect to different datasets, models, and prompt methods.
arXiv Detail & Related papers (2024-12-19T11:10:36Z)
Can You Trust LLM Judgments? Reliability of LLM-as-a-Judge [0.3759936323189418]
Large Language Models (LLMs) have become increasingly powerful and ubiquitous, but their nature poses challenges to the reliability of their outputs. We introduce a novel framework for rigorously evaluating the reliability of LLM judgments, leveraging McDonald's omega.
arXiv Detail & Related papers (2024-12-17T03:37:31Z)
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales [29.33581578047835]
SaySelf is a training framework that teaches large language models to express more accurate fine-grained confidence estimates. In addition, SaySelf directs LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration.
arXiv Detail & Related papers (2024-05-31T16:21:16Z)
When to Trust LLMs: Aligning Confidence with Response Quality [49.371218210305656]
We propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD) It integrates quality reward and order-preserving alignment reward functions. Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy.
arXiv Detail & Related papers (2024-04-26T09:42:46Z)
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience [41.06726400259579]
Large Language Models (LLMs) have exhibited remarkable performance across various downstream tasks. We propose a method of Learning from Past experience (LePe) to enhance the capability for confidence expression.
arXiv Detail & Related papers (2024-04-16T06:47:49Z)
Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models [84.94220787791389]
We propose Fact-and-Reflection (FaR) prompting, which improves the LLM calibration in two steps. Experiments show that FaR achieves significantly better calibration; it lowers the Expected Error by 23.5%. FaR even elicits the capability of verbally expressing concerns in less confident scenarios.
arXiv Detail & Related papers (2024-02-27T01:37:23Z)
Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models [23.42725642076256]
Large Language Models (LLMs) have catalyzed an increasing interest in their self-correction capabilities. This paper presents a comprehensive investigation into the intrinsic self-correction of LLMs. We develop an "If-or-Else" (IoE) prompting framework, designed to guide LLMs in assessing their own "confidence"
arXiv Detail & Related papers (2024-02-19T21:38:02Z)
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness [58.721012475577716]
Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge.
arXiv Detail & Related papers (2024-02-19T21:12:14Z)
The Calibration Gap between Model and Human Confidence in Large Language Models [14.539888672603743]
Large language models (LLMs) need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments. This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model.
arXiv Detail & Related papers (2024-01-24T22:21:04Z)
TrustLLM: Trustworthiness in Large Language Models [446.5640421311468]
This paper introduces TrustLLM, a comprehensive study of trustworthiness in large language models (LLMs) We first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics.
arXiv Detail & Related papers (2024-01-10T22:07:21Z)
Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks. How do we evaluate the capabilities of LLMs to consistently produce factually correct answers? We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.