Sharp Constants in Uniformity Testing via the Huber Statistic
- URL: http://arxiv.org/abs/2206.10722v1
- Date: Tue, 21 Jun 2022 20:43:53 GMT
- Title: Sharp Constants in Uniformity Testing via the Huber Statistic
- Authors: Shivam Gupta, Eric Price
- Abstract summary: Uniformity testing is one of the most well-studied problems in property testing.
It is known that the optimal sample complexity to distinguish the uniform distribution on $m$ elements from any $epsilon$-far distribution with $1-delta probability is $n.
We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs.
- Score: 16.384142529375435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Uniformity testing is one of the most well-studied problems in property
testing, with many known test statistics, including ones based on counting
collisions, singletons, and the empirical TV distance. It is known that the
optimal sample complexity to distinguish the uniform distribution on $m$
elements from any $\epsilon$-far distribution with $1-\delta$ probability is $n
= \Theta\left(\frac{\sqrt{m \log (1/\delta)}}{\epsilon^2} + \frac{\log
(1/\delta)}{\epsilon^2}\right)$, which is achieved by the empirical TV tester.
Yet in simulation, these theoretical analyses are misleading: in many cases,
they do not correctly rank order the performance of existing testers, even in
an asymptotic regime of all parameters tending to $0$ or $\infty$.
We explain this discrepancy by studying the \emph{constant factors} required
by the algorithms. We show that the collisions tester achieves a sharp maximal
constant in the number of standard deviations of separation between uniform and
non-uniform inputs. We then introduce a new tester based on the Huber loss, and
show that it not only matches this separation, but also has tails corresponding
to a Gaussian with this separation. This leads to a sample complexity of $(1 +
o(1))\frac{\sqrt{m \log (1/\delta)}}{\epsilon^2}$ in the regime where this term
is dominant, unlike all other existing testers.
Related papers
- Replicable Uniformity Testing [1.5883812630616523]
This work revisits uniformity testing under the framework of algorithmic replicability.
We obtain a replicable tester using only $tildeO(sqrtn varepsilon-2 rho-1)$ samples.
arXiv Detail & Related papers (2024-10-12T02:55:17Z) - The Sample Complexity of Simple Binary Hypothesis Testing [7.127829790714167]
The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d. samples required to distinguish between two distributions $p$ and $q$ in either setting.
This problem has only been studied when $alpha = beta$ (prior-free) or $alpha = 1/2$ (Bayesian)
arXiv Detail & Related papers (2024-03-25T17:42:32Z) - Testing with Non-identically Distributed Samples [20.74768558932617]
We examine the extent to which sublinear-sample property testing and estimation applies to settings where samples are independently but not identically distributed.
Even with $Theta(k/varepsilon2)$ samples from each distribution, $textbfp_mathrmavg$ is sufficient to learn $textbfp_mathrmavg$ to within error $varepsilon$ in TV distance.
arXiv Detail & Related papers (2023-11-19T01:25:50Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Random quantum circuits transform local noise into global white noise [118.18170052022323]
We study the distribution over measurement outcomes of noisy random quantum circuits in the low-fidelity regime.
For local noise that is sufficiently weak and unital, correlations (measured by the linear cross-entropy benchmark) between the output distribution $p_textnoisy$ of a generic noisy circuit instance shrink exponentially.
If the noise is incoherent, the output distribution approaches the uniform distribution $p_textunif$ at precisely the same rate.
arXiv Detail & Related papers (2021-11-29T19:26:28Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.