Simple Buehler-optimal confidence intervals on the average success
probability of independent Bernoulli trials
- URL: http://arxiv.org/abs/2212.12558v1
- Date: Fri, 23 Dec 2022 19:22:51 GMT
- Title: Simple Buehler-optimal confidence intervals on the average success
probability of independent Bernoulli trials
- Authors: Jean-Daniel Bancal, Pavel Sekatski
- Abstract summary: One-sided confidence intervals are presented for the average of non-identical Bernoulli parameters.
A simple interval valid for all confidence levels is also provided with a tightness guarantee.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One-sided confidence intervals are presented for the average of non-identical
Bernoulli parameters. These confidence intervals are expressed as analytical
functions of the total number of Bernoulli games won, the number of rounds and
the confidence level. Tightness of these bounds in the sense of Buehler, i.e.
as the strictest possible monotonic intervals, is demonstrated for all
confidence levels. A simple interval valid for all confidence levels is also
provided with a tightness guarantee. Finally, an application of the proposed
confidence intervals to sequential sampling is discussed.
Related papers
- Robust Confidence Intervals in Stereo Matching using Possibility Theory [2.522402937703098]
We propose a method for estimating disparity confidence intervals in stereo matching problems.
To the best of our knowledge, this is the first method creating disparity confidence intervals based on the cost volume.
The accuracy and size of confidence intervals are validated using the Middlebury stereo datasets as well as a dataset of satellite images.
arXiv Detail & Related papers (2024-04-09T12:48:24Z) - Mitigating LLM Hallucinations via Conformal Abstention [70.83870602967625]
We develop a principled procedure for determining when a large language model should abstain from responding in a general domain.
We leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate)
Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets.
arXiv Detail & Related papers (2024-04-04T11:32:03Z) - Show Your Work with Confidence: Confidence Bands for Tuning Curves [51.12106543561089]
tuning curves plot validation performance as a function of tuning effort.
We present the first method to construct valid confidence bands for tuning curves.
We validate our design with ablations, analyze the effect of sample size, and provide guidance on comparing models with our method.
arXiv Detail & Related papers (2023-11-16T00:50:37Z) - Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification.
We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate.
We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z) - Huber-Robust Confidence Sequences [37.16361789841549]
Confidence sequences are confidence intervals that can be sequentially tracked, and are valid at arbitrary data-dependent stopping times.
We show that the resulting confidence sequences attain the optimal width achieved in the nonsequential setting.
Since confidence sequences are a common tool used within A/B/n testing and bandits, these results open the door to sequential experimentation that is robust to outliers and adversarial corruptions.
arXiv Detail & Related papers (2023-01-23T17:29:26Z) - Fast Entropy-Based Methods of Word-Level Confidence Estimation for
End-To-End Automatic Speech Recognition [86.21889574126878]
We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word.
We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability.
arXiv Detail & Related papers (2022-12-16T20:27:40Z) - Catoni-style Confidence Sequences under Infinite Variance [19.61346221428679]
We provide an extension of confidence sequences for settings where the variance of the data-generating distribution does not exist or is infinite.
Confidence sequences furnish confidence intervals that are valid at arbitrary data-dependent stopping times.
The derived results are shown to better than confidence sequences obtained using Dubins-Savage inequality.
arXiv Detail & Related papers (2022-08-05T14:11:06Z) - Tight Concentrations and Confidence Sequences from the Regret of
Universal Portfolio [30.750408480772027]
Jun and Orabona [COLT'19] have shown how to easily convert the regret guarantee of an online betting algorithm into a time-uniform concentration inequality.
We show that we can go even further: We show that the regret of a minimax betting algorithm gives rise to a new implicit empirical time-uniform concentration.
arXiv Detail & Related papers (2021-10-27T00:44:32Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.