CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs
- URL: http://arxiv.org/abs/2502.04964v2
- Date: Tue, 11 Feb 2025 14:32:15 GMT
- Title: CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs
- Authors: Roman Vashurin, Maiya Goloburda, Preslav Nakov, Artem Shelmanov, Maxim Panov,
- Abstract summary: Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompasses a variety of approaches.
We propose a new way of synthesizing model confidence and output consistency that leads to a family of efficient and robust UQ methods.
- Score: 35.74755307680801
- License:
- Abstract: Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompasses a variety of approaches, with two major types being particularly prominent: information-based, which focus on model confidence expressed as token probabilities, and consistency-based, which assess the semantic relationship between multiple outputs generated using repeated sampling. Several recent methods have combined these two approaches and shown impressive performance in various applications. However, they sometimes fail to outperform much simpler baseline methods. Our investigation reveals distinctive characteristics of LLMs as probabilistic models, which help to explain why these UQ methods underperform in certain tasks. Based on these findings, we propose a new way of synthesizing model confidence and output consistency that leads to a family of efficient and robust UQ methods. We evaluate our approach across a variety of tasks such as question answering, abstractive summarization, and machine translation, demonstrating sizable improvements over state-of-the-art UQ approaches.
Related papers
- Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models [76.17975723711886]
Uncertainty quantification (UQ) is a prominent approach for eliciting truthful answers from large language models (LLMs)
In this work, we adapt Mahalanobis Distance (MD) - a well-established UQ technique in classification tasks - for text generation.
Our method extracts token embeddings from multiple layers of LLMs, computes MD scores for each token, and uses linear regression trained on these features to provide robust uncertainty scores.
arXiv Detail & Related papers (2025-02-20T10:25:13Z) - BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.
We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.
We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - Provably Better Explanations with Optimized Aggregation of Feature Attributions [36.22433695108499]
Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models.
We propose a novel approach to derive optimal convex combinations of feature attributions that yield provable improvements of desired quality criteria.
arXiv Detail & Related papers (2024-06-07T17:03:43Z) - Combining Confidence Elicitation and Sample-based Methods for
Uncertainty Quantification in Misinformation Mitigation [6.929834518749884]
Large Language Models have emerged as prime candidates to tackle misinformation mitigation.
Existing approaches struggle with hallucinations and overconfident predictions.
We propose an uncertainty quantification framework that leverages both direct confidence elicitation and sampled-based consistency methods.
arXiv Detail & Related papers (2024-01-13T16:36:58Z) - Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust.
Existing methods often fall short of explaining model predictions effectively or efficiently.
We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z) - Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement [50.62461749446111]
Self-Polish (SP) is a novel method that facilitates the model's reasoning by guiding it to progressively refine the given problems to be more comprehensible and solvable.
SP is to all other prompting methods of answer/reasoning side like CoT, allowing for seamless integration with state-of-the-art techniques for further improvement.
arXiv Detail & Related papers (2023-05-23T19:58:30Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.