CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs
- URL: http://arxiv.org/abs/2502.04964v2
- Date: Tue, 11 Feb 2025 14:32:15 GMT
- Title: CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs
- Authors: Roman Vashurin, Maiya Goloburda, Preslav Nakov, Artem Shelmanov, Maxim Panov,
- Abstract summary: Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompasses a variety of approaches.<n>We propose a new way of synthesizing model confidence and output consistency that leads to a family of efficient and robust UQ methods.
- Score: 35.74755307680801
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompasses a variety of approaches, with two major types being particularly prominent: information-based, which focus on model confidence expressed as token probabilities, and consistency-based, which assess the semantic relationship between multiple outputs generated using repeated sampling. Several recent methods have combined these two approaches and shown impressive performance in various applications. However, they sometimes fail to outperform much simpler baseline methods. Our investigation reveals distinctive characteristics of LLMs as probabilistic models, which help to explain why these UQ methods underperform in certain tasks. Based on these findings, we propose a new way of synthesizing model confidence and output consistency that leads to a family of efficient and robust UQ methods. We evaluate our approach across a variety of tasks such as question answering, abstractive summarization, and machine translation, demonstrating sizable improvements over state-of-the-art UQ approaches.
Related papers
- Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.
We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.
Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z) - Uncertainty Quantification of Large Language Models through Multi-Dimensional Responses [4.505944978127014]
We introduce a multi-dimensional UQ framework that integrates semantic and knowledge-aware similarity analysis.
This approach disentangles overlapping information from both semantic and knowledge dimensions, capturing both semantic variations and factual consistency.
Our empirical evaluations demonstrate that our method outperforms existing techniques in identifying uncertain responses.
arXiv Detail & Related papers (2025-02-24T04:05:08Z) - Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models [76.17975723711886]
Uncertainty quantification (UQ) is a prominent approach for eliciting truthful answers from large language models (LLMs)
In this work, we adapt Mahalanobis Distance (MD) - a well-established UQ technique in classification tasks - for text generation.
Our method extracts token embeddings from multiple layers of LLMs, computes MD scores for each token, and uses linear regression trained on these features to provide robust uncertainty scores.
arXiv Detail & Related papers (2025-02-20T10:25:13Z) - BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.<n>We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.<n>We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.<n>We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.<n>We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - Provably Better Explanations with Optimized Aggregation of Feature Attributions [36.22433695108499]
Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models.
We propose a novel approach to derive optimal convex combinations of feature attributions that yield provable improvements of desired quality criteria.
arXiv Detail & Related papers (2024-06-07T17:03:43Z) - Combining Confidence Elicitation and Sample-based Methods for
Uncertainty Quantification in Misinformation Mitigation [6.929834518749884]
Large Language Models have emerged as prime candidates to tackle misinformation mitigation.
Existing approaches struggle with hallucinations and overconfident predictions.
We propose an uncertainty quantification framework that leverages both direct confidence elicitation and sampled-based consistency methods.
arXiv Detail & Related papers (2024-01-13T16:36:58Z) - Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust.
Existing methods often fall short of explaining model predictions effectively or efficiently.
We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z) - Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement [50.62461749446111]
Self-Polish (SP) is a novel method that facilitates the model's reasoning by guiding it to progressively refine the given problems to be more comprehensible and solvable.
SP is to all other prompting methods of answer/reasoning side like CoT, allowing for seamless integration with state-of-the-art techniques for further improvement.
arXiv Detail & Related papers (2023-05-23T19:58:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.