Confidence sequences for sampling without replacement
- URL: http://arxiv.org/abs/2006.04347v4
- Date: Fri, 8 Jan 2021 15:45:00 GMT
- Title: Confidence sequences for sampling without replacement
- Authors: Ian Waudby-Smith, Aaditya Ramdas
- Abstract summary: We present a suite of tools for designing confidence sequences for $thetastar$.
A CS is a sequence of confidence sets $(C_n)_n=1N$, that shrink in size, and all contain $thetastar$ simultaneously with high probability.
We then present Hoeffding- and empirical-Bernstein-type time-uniform CSs and fixed-time confidence intervals for sampling WoR.
- Score: 39.98103047898974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many practical tasks involve sampling sequentially without replacement (WoR)
from a finite population of size $N$, in an attempt to estimate some parameter
$\theta^\star$. Accurately quantifying uncertainty throughout this process is a
nontrivial task, but is necessary because it often determines when we stop
collecting samples and confidently report a result. We present a suite of tools
for designing confidence sequences (CS) for $\theta^\star$. A CS is a sequence
of confidence sets $(C_n)_{n=1}^N$, that shrink in size, and all contain
$\theta^\star$ simultaneously with high probability. We present a generic
approach to constructing a frequentist CS using Bayesian tools, based on the
fact that the ratio of a prior to the posterior at the ground truth is a
martingale. We then present Hoeffding- and empirical-Bernstein-type
time-uniform CSs and fixed-time confidence intervals for sampling WoR, which
improve on previous bounds in the literature and explicitly quantify the
benefit of WoR sampling.
Related papers
- Treatment of Statistical Estimation Problems in Randomized Smoothing for Adversarial Robustness [0.0]
We review the statistical estimation problems for randomized smoothing to find out if the computational burden is necessary.
We present estimation procedures employing confidence sequences enjoying the same statistical guarantees as the standard methods.
We provide a randomized version of Clopper-Pearson confidence intervals resulting in strictly stronger certificates.
arXiv Detail & Related papers (2024-06-25T14:00:55Z) - Favour: FAst Variance Operator for Uncertainty Rating [0.034530027457862]
Bayesian Neural Networks (BNN) have emerged as a crucial approach for interpreting ML predictions.
By sampling from the posterior distribution, data scientists may estimate the uncertainty of an inference.
Previous work proposed propagating the first and second moments of the posterior directly through the network.
This method is even slower than sampling, so the propagated variance needs to be approximated.
Our contribution is a more principled variance propagation framework.
arXiv Detail & Related papers (2023-11-21T22:53:20Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - Conformal Nucleus Sampling [67.5232384936661]
We assess whether a top-$p$ set is indeed aligned with its probabilistic meaning in various linguistic contexts.
We find that OPT models are overconfident, and that calibration shows a moderate inverse scaling with model size.
arXiv Detail & Related papers (2023-05-04T08:11:57Z) - Tighter PAC-Bayes Bounds Through Coin-Betting [31.148069991567215]
We show how to refine the proof strategy of the PAC-Bayes bounds and achieve empheven tighter guarantees.
We derive the first PAC-Bayes concentration inequality based on the coin-betting approach that holds simultaneously for all sample sizes.
arXiv Detail & Related papers (2023-02-12T01:16:27Z) - Active Sampling of Multiple Sources for Sequential Estimation [92.37271004438406]
The objective is to design an active sampling algorithm for sequentially estimating parameters in order to form reliable estimates.
This paper adopts emph conditional estimation cost functions, leading to a sequential estimation approach that was recently shown to render tractable analysis.
arXiv Detail & Related papers (2022-08-10T15:58:05Z) - Convergence for score-based generative modeling with polynomial
complexity [9.953088581242845]
We prove the first convergence guarantees for the core mechanic behind Score-based generative modeling.
Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality.
We show that a predictor-corrector gives better convergence than using either portion alone.
arXiv Detail & Related papers (2022-06-13T14:57:35Z) - Consistent Estimation for PCA and Sparse Regression with Oblivious
Outliers [13.244654316770815]
We develop machinery to design efficiently computable and consistent estimators.
For sparse regression, we achieve consistency for optimal sample size $ngsim (klog d)/alpha2$.
In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix.
arXiv Detail & Related papers (2021-11-04T15:59:44Z) - Improved, Deterministic Smoothing for L1 Certified Robustness [119.86676998327864]
We propose a non-additive and deterministic smoothing method, Deterministic Smoothing with Splitting Noise (DSSN)
In contrast to uniform additive smoothing, the SSN certification does not require the random noise components used to be independent.
This is the first work to provide deterministic "randomized smoothing" for a norm-based adversarial threat model.
arXiv Detail & Related papers (2021-03-17T21:49:53Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.