The Minimax Risk in Testing Uniformity of Poisson Data under Missing Ball Alternatives within a Hypercube
- URL: http://arxiv.org/abs/2305.18111v6
- Date: Mon, 15 Jul 2024 19:30:00 GMT
- Title: The Minimax Risk in Testing Uniformity of Poisson Data under Missing Ball Alternatives within a Hypercube
- Authors: Alon Kipnis,
- Abstract summary: We study the problem of testing the goodness of fit of occurrences of items from many categories to an identical Poisson distribution over the categories.
As a class of alternative hypotheses, we consider the removal of an $ell_p$ ball, $p leq 2$, of radius $epsilon$ from a hypercube.
When the expected number of samples $n$ and number of categories $N$ go to infinity while $epsilon$ is small, the minimax risk asymptotes to $2Phi(-n N2-2/p eps
- Score: 8.285441115330944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of testing the goodness of fit of occurrences of items from many categories to an identical Poisson distribution over the categories. As a class of alternative hypotheses, we consider the removal of an $\ell_p$ ball, $p \leq 2$, of radius $\epsilon$ from a hypercube around the sequence of uniform Poisson rates. When the expected number of samples $n$ and number of categories $N$ go to infinity while $\epsilon$ is small, the minimax risk asymptotes to $2\Phi(-n N^{2-2/p} \epsilon^2/\sqrt{8N})$; $\Phi(x)$ is the normal CDF. This result allows the comparison of the many estimators previously proposed for this problem at the constant level, rather than at the rate of convergence of the risk or the scaling order of the sample complexity. The minimax test mostly relies on collisions in the small sample limit but behaves like the chisquared test. Empirical studies over a range of problem parameters show that the asymptotic risk estimate is accurate in finite samples and that the minimax test is significantly better than the chisquared test or a test that only uses collisions. Our analysis combines standard ideas from non-parametric hypothesis testing with new results in the low count limit of multiple Poisson distributions, including the convexity of certain kernels and a central limit theorem of linear test statistics.
Related papers
- Wasserstein F-tests for Fréchet regression on Bures-Wasserstein manifolds [0.9514940899499753]
Fr'echet regression on the Bures-Wasserstein manifold is developed.
A test for the null hypothesis of no association is proposed.
Results show that the proposed test has the desired level of significanceally.
arXiv Detail & Related papers (2024-04-05T04:01:51Z) - Optimal minimax rate of learning interaction kernels [7.329333373512536]
We introduce a tamed least squares estimator (tLSE) that attains the optimal convergence rate for a broad class of exchangeable distributions.
Our findings reveal that provided the inverse problem in the large sample limit satisfies a coercivity condition, the left tail probability does not alter the bias-variance tradeoff.
arXiv Detail & Related papers (2023-11-28T15:01:58Z) - Sharp Constants in Uniformity Testing via the Huber Statistic [16.384142529375435]
Uniformity testing is one of the most well-studied problems in property testing.
It is known that the optimal sample complexity to distinguish the uniform distribution on $m$ elements from any $epsilon$-far distribution with $1-delta probability is $n.
We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs.
arXiv Detail & Related papers (2022-06-21T20:43:53Z) - Polyak-Ruppert Averaged Q-Leaning is Statistically Efficient [90.14768299744792]
We study synchronous Q-learning with Polyak-Ruppert averaging (a.k.a., averaged Q-leaning) in a $gamma$-discounted MDP.
We establish normality for the iteration averaged $barboldsymbolQ_T$.
In short, our theoretical analysis shows averaged Q-Leaning is statistically efficient.
arXiv Detail & Related papers (2021-12-29T14:47:56Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.