Multi-Perspective Consistency Enhances Confidence Estimation in Large
Language Models
- URL: http://arxiv.org/abs/2402.11279v1
- Date: Sat, 17 Feb 2024 13:37:39 GMT
- Title: Multi-Perspective Consistency Enhances Confidence Estimation in Large
Language Models
- Authors: Pei Wang, Yejie Wang, Muxi Diao, Keqing He, Guanting Dong, Weiran Xu
- Abstract summary: This work focuses on improving the confidence estimation of large language models.
Considering the fragility of self-awareness in language models, we introduce a Multi-Perspective Consistency (MPC) method.
The experimental results on eight publicly available datasets show that our MPC achieves state-of-the-art performance.
- Score: 27.63938857490995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the deployment of large language models (LLMs), accurate confidence
estimation is critical for assessing the credibility of model predictions.
However, existing methods often fail to overcome the issue of overconfidence on
incorrect answers. In this work, we focus on improving the confidence
estimation of large language models. Considering the fragility of
self-awareness in language models, we introduce a Multi-Perspective Consistency
(MPC) method. We leverage complementary insights from different perspectives
within models (MPC-Internal) and across different models (MPC-Across) to
mitigate the issue of overconfidence arising from a singular viewpoint. The
experimental results on eight publicly available datasets show that our MPC
achieves state-of-the-art performance. Further analyses indicate that MPC can
mitigate the problem of overconfidence and is effectively scalable to other
models.
Related papers
- The Craft of Selective Prediction: Towards Reliable Case Outcome Classification -- An Empirical Study on European Court of Human Rights Cases [1.9570703832723582]
This paper conducts an empirical investigation into how various design choices affect the reliability of COC models within the framework of selective prediction.
Our experiments on the multi-label COC task, focusing on European Court of Human Rights (ECtHR) cases, highlight the importance of a diverse yet domain-specific pre-training corpus for better calibration.
arXiv Detail & Related papers (2024-09-27T11:25:10Z) - Finetuning Language Models to Emit Linguistic Expressions of Uncertainty [5.591074369497796]
Large language models (LLMs) are increasingly employed in information-seeking and decision-making tasks.
LLMs tend to generate information that conflicts with real-world facts, and their persuasive style can make these inaccuracies appear confident and convincing.
In this work, we explore supervised finetuning on uncertainty-augmented predictions as a method to develop models that produce linguistic expressions of uncertainty.
arXiv Detail & Related papers (2024-09-18T17:52:53Z) - Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models [14.5291643644017]
We introduce the concept of Confidence-Probability Alignment.
We probe the alignment between models' internal and expressed confidence.
Among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment.
arXiv Detail & Related papers (2024-05-25T15:42:04Z) - Uncertainty-Aware Evaluation for Vision-Language Models [0.0]
Current evaluation methods overlook an essential component: uncertainty.
We show that models with the highest accuracy may also have the highest uncertainty.
Our empirical findings also reveal a correlation between model uncertainty and its language model part.
arXiv Detail & Related papers (2024-02-22T10:04:17Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets [46.19529338280716]
Language models, characterized by their black-box nature, often hallucinate and display sensitivity to input perturbations.
We introduce a methodology designed to examine how input perturbations affect language models across various scales.
We present three distinct fine-tuning strategies to address robustness against multiple perturbations.
arXiv Detail & Related papers (2023-11-15T02:59:10Z) - Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation [50.920911532133154]
The intrinsic ill-posedness and ordinal-sensitive nature of monocular depth estimation (MDE) models pose major challenges to the estimation of uncertainty degree.
We propose to model the uncertainty of MDE models from the perspective of the inherent probability distributions.
By simply introducing additional training regularization terms, our model, with surprisingly simple formations and without requiring extra modules or multiple inferences, can provide uncertainty estimations with state-of-the-art reliability.
arXiv Detail & Related papers (2023-07-19T12:11:15Z) - Calibrating Multimodal Learning [94.65232214643436]
We propose a novel regularization technique, i.e., Calibrating Multimodal Learning (CML) regularization, to calibrate the predictive confidence of previous methods.
This technique could be flexibly equipped by existing models and improve the performance in terms of confidence calibration, classification accuracy, and model robustness.
arXiv Detail & Related papers (2023-06-02T04:29:57Z) - Trusted Multi-View Classification with Dynamic Evidential Fusion [73.35990456162745]
We propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC)
TMC provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.
arXiv Detail & Related papers (2022-04-25T03:48:49Z) - Trusted Multi-View Classification [76.73585034192894]
We propose a novel multi-view classification method, termed trusted multi-view classification.
It provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
The proposed algorithm jointly utilizes multiple views to promote both classification reliability and robustness.
arXiv Detail & Related papers (2021-02-03T13:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.