The Minimax Risk in Testing Uniformity of Poisson Data under Missing Ball Alternatives
- URL: http://arxiv.org/abs/2305.18111v8
- Date: Fri, 30 May 2025 07:52:31 GMT
- Title: The Minimax Risk in Testing Uniformity of Poisson Data under Missing Ball Alternatives
- Authors: Alon Kipnis,
- Abstract summary: We study the problem of testing the goodness of fit of occurrences of items from many categories to a Poisson distribution uniform over the categories.<n>We characterize the minimax risk for this problem as the expected number of samples $n$ and the number of categories $N$ go to infinity.
- Score: 8.285441115330944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of testing the goodness of fit of occurrences of items from many categories to a Poisson distribution uniform over the categories, against a class of alternative hypotheses obtained by the removal of an $\ell_p$ ball, $p \leq 2$, of radius $\epsilon$ around the sequence of uniform Poisson rates. We characterize the minimax risk for this problem as the expected number of samples $n$ and the number of categories $N$ go to infinity. Our result enables the comparison at the constant level of the many estimators previously proposed for this problem, rather than at the rate of convergence of the risk or the scaling order of the sample complexity. The minimax test relies exclusively on collisions in the small sample limit but behaves like the chisquared test otherwise. Empirical studies over a range of problem parameters show that the asymptotic risk estimate is accurate in finite samples and that the minimax test is significantly better than the chisquared test or a test that only uses collisions. Our analysis involves the reduction to a structured subset of alternatives, establishing asymptotic normality for linear statistics, and solving an optimization problem over $N$-dimensional sequences that parallels classical results from Gaussian white noise models.
Related papers
- The Polynomial Stein Discrepancy for Assessing Moment Convergence [1.0835264351334324]
We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference.<n>We show that the test has higher power than its competitors in several examples, and at a lower computational cost.
arXiv Detail & Related papers (2024-12-06T15:51:04Z) - Wasserstein F-tests for Fréchet regression on Bures-Wasserstein manifolds [0.9514940899499753]
Fr'echet regression on the Bures-Wasserstein manifold is developed.
A test for the null hypothesis of no association is proposed.
Results show that the proposed test has the desired level of significanceally.
arXiv Detail & Related papers (2024-04-05T04:01:51Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Optimal minimax rate of learning interaction kernels [7.329333373512536]
We introduce a tamed least squares estimator (tLSE) that attains the optimal convergence rate for a broad class of exchangeable distributions.
Our findings reveal that provided the inverse problem in the large sample limit satisfies a coercivity condition, the left tail probability does not alter the bias-variance tradeoff.
arXiv Detail & Related papers (2023-11-28T15:01:58Z) - Precise Error Rates for Computationally Efficient Testing [67.30044609837749]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.<n>An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - On High dimensional Poisson models with measurement error: hypothesis
testing for nonlinear nonconvex optimization [13.369004892264146]
We estimation and testing regression model with high dimensionals, which has wide applications in analyzing data.
We propose to estimate regression parameter through minimizing penalized consistency.
The proposed method is applied to the Alzheimer's Disease Initiative.
arXiv Detail & Related papers (2022-12-31T06:58:42Z) - Sharp Constants in Uniformity Testing via the Huber Statistic [16.384142529375435]
Uniformity testing is one of the most well-studied problems in property testing.
It is known that the optimal sample complexity to distinguish the uniform distribution on $m$ elements from any $epsilon$-far distribution with $1-delta probability is $n.
We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs.
arXiv Detail & Related papers (2022-06-21T20:43:53Z) - Polyak-Ruppert Averaged Q-Leaning is Statistically Efficient [90.14768299744792]
We study synchronous Q-learning with Polyak-Ruppert averaging (a.k.a., averaged Q-leaning) in a $gamma$-discounted MDP.
We establish normality for the iteration averaged $barboldsymbolQ_T$.
In short, our theoretical analysis shows averaged Q-Leaning is statistically efficient.
arXiv Detail & Related papers (2021-12-29T14:47:56Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Minimax Estimation of Conditional Moment Models [40.95498063465325]
We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game.
We analyze the statistical estimation rate of the resulting estimator for arbitrary hypothesis spaces.
We show how our modified mean squared error rate, combined with conditions that bound the ill-posedness of the inverse problem, lead to mean squared error rates.
arXiv Detail & Related papers (2020-06-12T14:02:38Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.