Related papers: Simple Buehler-optimal confidence intervals on the average success probability of independent Bernoulli trials

Related papers

STaR-Bets: Sequential Target-Recalculating Bets for Tighter Confidence Intervals [9.319818839579137]
We propose a betting-based algorithm to compute confidence intervals that empirically outperforms the competitors.<n>Our strategy uses the optimal strategy in every step, whereas the standard betting methods choose a constant strategy in advance.<n>We also prove that the width of our confidence intervals is optimal up to an $1+o(1)$ factor diminishing with $n$.
arXiv Detail & Related papers (2025-05-28T14:48:07Z)
A new and flexible class of sharp asymptotic time-uniform confidence sequences [0.0]
As in classical statistics, confidence sequences are a nonparametric tool showing under which high-level assumptions coverage is achieved. We propose a new flexible class of confidence sequences yielding sharp time-uniform confidence sequences under mild assumptions.
arXiv Detail & Related papers (2025-02-14T18:57:16Z)
Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences [62.52739672949452]
Language models (LMs) should provide reliable confidence estimates to help users detect mistakes in their outputs and defer to human experts when necessary. We propose relative confidence estimation, where we match up questions against each other and ask the model to make relative judgments of confidence. Treating each question as a "player" in a series of matchups against other questions and the model's preferences as match outcomes, we can use rank aggregation methods like Elo rating and Bradley-Terry to translate the model's confidence preferences into confidence scores.
arXiv Detail & Related papers (2025-02-03T07:43:27Z)
Robust Confidence Intervals in Stereo Matching using Possibility Theory [2.522402937703098]
We propose a method for estimating disparity confidence intervals in stereo matching problems. To the best of our knowledge, this is the first method creating disparity confidence intervals based on the cost volume. The accuracy and size of confidence intervals are validated using the Middlebury stereo datasets as well as a dataset of satellite images.
arXiv Detail & Related papers (2024-04-09T12:48:24Z)
Mitigating LLM Hallucinations via Conformal Abstention [70.83870602967625]
We develop a principled procedure for determining when a large language model should abstain from responding in a general domain. We leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate) Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets.
arXiv Detail & Related papers (2024-04-04T11:32:03Z)
Show Your Work with Confidence: Confidence Bands for Tuning Curves [51.12106543561089]
tuning curves plot validation performance as a function of tuning effort. We present the first method to construct valid confidence bands for tuning curves. We validate our design with ablations, analyze the effect of sample size, and provide guidance on comparing models with our method.
arXiv Detail & Related papers (2023-11-16T00:50:37Z)
Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification. We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate. We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z)
Huber-Robust Confidence Sequences [37.16361789841549]
Confidence sequences are confidence intervals that can be sequentially tracked, and are valid at arbitrary data-dependent stopping times. We show that the resulting confidence sequences attain the optimal width achieved in the nonsequential setting. Since confidence sequences are a common tool used within A/B/n testing and bandits, these results open the door to sequential experimentation that is robust to outliers and adversarial corruptions.
arXiv Detail & Related papers (2023-01-23T17:29:26Z)
Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition [86.21889574126878]
We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word. We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability.
arXiv Detail & Related papers (2022-12-16T20:27:40Z)
Catoni-style Confidence Sequences under Infinite Variance [19.61346221428679]
We provide an extension of confidence sequences for settings where the variance of the data-generating distribution does not exist or is infinite. Confidence sequences furnish confidence intervals that are valid at arbitrary data-dependent stopping times. The derived results are shown to better than confidence sequences obtained using Dubins-Savage inequality.
arXiv Detail & Related papers (2022-08-05T14:11:06Z)
Tight Concentrations and Confidence Sequences from the Regret of Universal Portfolio [30.750408480772027]
Jun and Orabona [COLT'19] have shown how to easily convert the regret guarantee of an online betting algorithm into a time-uniform concentration inequality. We show that we can go even further: We show that the regret of a minimax betting algorithm gives rise to a new implicit empirical time-uniform concentration.
arXiv Detail & Related papers (2021-10-27T00:44:32Z)
An evaluation of word-level confidence estimation for end-to-end automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR) We provide an extensive benchmark of popular confidence methods on four well-known speech datasets. Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z)
CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning. We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.