Related papers: Benchmarking Large Language Model Uncertainty for Prompt Optimization

Benchmarking Large Language Model Uncertainty for Prompt Optimization

URL: http://arxiv.org/abs/2409.10044v2
Date: Wed, 25 Dec 2024 03:12:44 GMT
Title: Benchmarking Large Language Model Uncertainty for Prompt Optimization
Authors: Pei-Fu Guo, Yun-Da Tsai, Shou-De Lin,
Abstract summary: This paper introduces a benchmark dataset to evaluate uncertainty metrics.<n>We show that current metrics align more with Answer Uncertainty, which reflects output confidence and diversity, rather than Correctness Uncertainty.
Score: 4.151658495779136
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Prompt optimization algorithms for Large Language Models (LLMs) excel in multi-step reasoning but still lack effective uncertainty estimation. This paper introduces a benchmark dataset to evaluate uncertainty metrics, focusing on Answer, Correctness, Aleatoric, and Epistemic Uncertainty. Through analysis of models like GPT-3.5-Turbo and Meta-Llama-3.1-8B-Instruct, we show that current metrics align more with Answer Uncertainty, which reflects output confidence and diversity, rather than Correctness Uncertainty, highlighting the need for improved metrics that are optimization-objective-aware to better guide prompt optimization. Our code and dataset are available at https://github.com/0Frett/PO-Uncertainty-Benchmarking.

Related papers

Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models [3.958317527488534]
Large Language and Vision-Language Models (LLMs/VLMs) are increasingly used in safety-critical applications. Uncertainty quantification helps assess prediction confidence and enables abstention when uncertainty is high. We propose learnable abstention, integrating reinforcement learning (RL) with Conformal Prediction (CP) to optimize abstention thresholds.
arXiv Detail & Related papers (2025-02-08T21:30:41Z)
Uncertainty-Penalized Direct Preference Optimization [52.387088396044206]
We develop a pessimistic framework for DPO by introducing preference uncertainty penalization schemes. The penalization serves as a correction to the loss which attenuates the loss gradient for uncertain samples. We show improved overall performance compared to vanilla DPO, as well as better completions on prompts from high-uncertainty chosen/rejected responses.
arXiv Detail & Related papers (2024-10-26T14:24:37Z)
UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation [93.38604803625294]
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) We use Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks. UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-03T17:39:38Z)
End-to-End Conformal Calibration for Optimization Under Uncertainty [32.844953018302874]
This paper develops an end-to-end framework to learn the uncertainty estimates for conditional optimization. In addition, we propose to represent arbitrary convex uncertainty sets with partially convex neural networks. Our approach consistently improves upon two-stage-then-optimize.
arXiv Detail & Related papers (2024-09-30T17:38:27Z)
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization [9.618391485742968]
Iterative preference optimization has recently become one of the de-facto training paradigms for large language models (LLMs) We present an uncertainty-enhanced textbfPreference textbfOptimization framework to make the LLM self-evolve with reliable feedback. Our framework substantially alleviates the noisy problem and improves the performance of iterative preference optimization.
arXiv Detail & Related papers (2024-09-17T14:05:58Z)
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization [78.82586283794886]
We present a new offline alignment algorithm, $chi2$-Preference Optimization ($chi$PO) $chi$PO implements the principle of pessimism in the face of uncertainty via regularization. It is provably robust to overoptimization and achieves sample-complexity guarantees based on single-policy concentrability.
arXiv Detail & Related papers (2024-07-18T11:08:40Z)
Robust portfolio optimization for recommender systems considering uncertainty of estimated statistics [2.928964540437144]
We propose a robust portfolio optimization model that copes with the uncertainty of estimated statistics based on the cardinality-based uncertainty sets. Our method has the potential to improve the recommendation quality of various rating prediction algorithms.
arXiv Detail & Related papers (2024-06-09T15:42:54Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
Optimal Learning via Moderate Deviations Theory [4.6930976245638245]
We develop a systematic construction of highly accurate confidence intervals by using a moderate deviation principle-based approach. It is shown that the proposed confidence intervals are statistically optimal in the sense that they satisfy criteria regarding exponential accuracy, minimality, consistency, mischaracterization probability, and eventual uniformly most accurate (UMA) property.
arXiv Detail & Related papers (2023-05-23T19:57:57Z)
Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures. We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z)
Data-Driven Robust Optimization using Unsupervised Deep Learning [0.0]
We show that a trained neural network can be integrated into a robust optimization model by formulating the adversarial problem as a convex mixed-integer program. We find that this approach outperforms a similar approach using kernel-based support vector sets.
arXiv Detail & Related papers (2020-11-19T11:06:54Z)
Robust Policy Search for Robot Navigation [3.130722489512822]
Complex robot navigation and control problems can be framed as policy search problems. In this work, we incorporate both robust optimization and statistical robustness, showing that both types of robustness are synergistic. We present results in several benchmarks and robot tasks to have convergence guarantees and improved performance even with surrogate modeling errors.
arXiv Detail & Related papers (2020-03-02T16:30:59Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.