Optimal rates for independence testing via $U$-statistic permutation
tests
- URL: http://arxiv.org/abs/2001.05513v2
- Date: Fri, 6 Nov 2020 11:50:28 GMT
- Title: Optimal rates for independence testing via $U$-statistic permutation
tests
- Authors: Thomas B. Berrett, Ioannis Kontoyiannis, Richard J. Samworth
- Abstract summary: We study the problem of independence testing given independent and identically distributed pairs taking values in a $sigma$-finite, separable measure space.
We first show that there is no valid test of independence that is uniformly consistent against alternatives of the form $f: D(f) geq rho2 $.
- Score: 7.090165638014331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of independence testing given independent and
identically distributed pairs taking values in a $\sigma$-finite, separable
measure space. Defining a natural measure of dependence $D(f)$ as the squared
$L^2$-distance between a joint density $f$ and the product of its marginals, we
first show that there is no valid test of independence that is uniformly
consistent against alternatives of the form $\{f: D(f) \geq \rho^2 \}$. We
therefore restrict attention to alternatives that impose additional
Sobolev-type smoothness constraints, and define a permutation test based on a
basis expansion and a $U$-statistic estimator of $D(f)$ that we prove is
minimax optimal in terms of its separation rates in many instances. Finally,
for the case of a Fourier basis on $[0,1]^2$, we provide an approximation to
the power function that offers several additional insights. Our methodology is
implemented in the R package USP.
Related papers
- Testing Identity of Distributions under Kolmogorov Distance in Polylogarithmic Space [1.2277343096128712]
In this paper, we show that much less space suffices -- we give an algorithm that uses space $O(log4 varepsilon-1)$ in the streaming setting.
We also state 9 related open problems that we hope will spark interest in this and related problems.
arXiv Detail & Related papers (2024-10-29T15:24:27Z) - A Conditional Independence Test in the Presence of Discretization [14.917729593550199]
Existing test methods can't work when only discretized observations are available.
We propose a conditional independence test specifically designed to accommodate the presence of such discretization.
arXiv Detail & Related papers (2024-04-26T18:08:15Z) - On Ranking-based Tests of Independence [0.0]
We develop a novel nonparametric framework to test the independence of two random variables $mathbfX$ and $mathbfY$.
We consider a wide class of rank statistics encompassing many ways of deviating from the diagonal in the ROC space to build tests of independence.
arXiv Detail & Related papers (2024-03-12T10:00:00Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - An $\ell^p$-based Kernel Conditional Independence Test [21.689461247198388]
We propose a new computationally efficient test for conditional independence based on the $Lp$ distance between two kernel-based representatives of well suited distributions.
We conduct a series of experiments showing that the performance of our new tests outperforms state-of-the-art methods both in term of statistical power and type-I error even in the high dimensional setting.
arXiv Detail & Related papers (2021-10-28T03:18:27Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Dimension-agnostic inference using cross U-statistics [33.17951971728784]
We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization.
The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks.
arXiv Detail & Related papers (2020-11-10T12:21:34Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.