Simple Buehler-optimal confidence intervals on the average success
probability of independent Bernoulli trials
- URL: http://arxiv.org/abs/2212.12558v1
- Date: Fri, 23 Dec 2022 19:22:51 GMT
- Title: Simple Buehler-optimal confidence intervals on the average success
probability of independent Bernoulli trials
- Authors: Jean-Daniel Bancal, Pavel Sekatski
- Abstract summary: One-sided confidence intervals are presented for the average of non-identical Bernoulli parameters.
A simple interval valid for all confidence levels is also provided with a tightness guarantee.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One-sided confidence intervals are presented for the average of non-identical
Bernoulli parameters. These confidence intervals are expressed as analytical
functions of the total number of Bernoulli games won, the number of rounds and
the confidence level. Tightness of these bounds in the sense of Buehler, i.e.
as the strictest possible monotonic intervals, is demonstrated for all
confidence levels. A simple interval valid for all confidence levels is also
provided with a tightness guarantee. Finally, an application of the proposed
confidence intervals to sequential sampling is discussed.
Related papers
- A new and flexible class of sharp asymptotic time-uniform confidence sequences [0.0]
As in classical statistics, confidence sequences are a nonparametric tool showing under which high-level assumptions coverage is achieved.
We propose a new flexible class of confidence sequences yielding sharp time-uniform confidence sequences under mild assumptions.
arXiv Detail & Related papers (2025-02-14T18:57:16Z) - Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences [62.52739672949452]
Language models (LMs) should provide reliable confidence estimates to help users detect mistakes in their outputs and defer to human experts when necessary.
We propose relative confidence estimation, where we match up questions against each other and ask the model to make relative judgments of confidence.
Treating each question as a "player" in a series of matchups against other questions and the model's preferences as match outcomes, we can use rank aggregation methods like Elo rating and Bradley-Terry to translate the model's confidence preferences into confidence scores.
arXiv Detail & Related papers (2025-02-03T07:43:27Z) - Robust Confidence Intervals in Stereo Matching using Possibility Theory [2.522402937703098]
We propose a method for estimating disparity confidence intervals in stereo matching problems.
To the best of our knowledge, this is the first method creating disparity confidence intervals based on the cost volume.
The accuracy and size of confidence intervals are validated using the Middlebury stereo datasets as well as a dataset of satellite images.
arXiv Detail & Related papers (2024-04-09T12:48:24Z) - Show Your Work with Confidence: Confidence Bands for Tuning Curves [51.12106543561089]
tuning curves plot validation performance as a function of tuning effort.
We present the first method to construct valid confidence bands for tuning curves.
We validate our design with ablations, analyze the effect of sample size, and provide guidance on comparing models with our method.
arXiv Detail & Related papers (2023-11-16T00:50:37Z) - Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification.
We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate.
We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z) - Huber-Robust Confidence Sequences [37.16361789841549]
Confidence sequences are confidence intervals that can be sequentially tracked, and are valid at arbitrary data-dependent stopping times.
We show that the resulting confidence sequences attain the optimal width achieved in the nonsequential setting.
Since confidence sequences are a common tool used within A/B/n testing and bandits, these results open the door to sequential experimentation that is robust to outliers and adversarial corruptions.
arXiv Detail & Related papers (2023-01-23T17:29:26Z) - Fast Entropy-Based Methods of Word-Level Confidence Estimation for
End-To-End Automatic Speech Recognition [86.21889574126878]
We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word.
We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability.
arXiv Detail & Related papers (2022-12-16T20:27:40Z) - Catoni-style Confidence Sequences under Infinite Variance [19.61346221428679]
We provide an extension of confidence sequences for settings where the variance of the data-generating distribution does not exist or is infinite.
Confidence sequences furnish confidence intervals that are valid at arbitrary data-dependent stopping times.
The derived results are shown to better than confidence sequences obtained using Dubins-Savage inequality.
arXiv Detail & Related papers (2022-08-05T14:11:06Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.