A Permutation-Free Kernel Independence Test
- URL: http://arxiv.org/abs/2212.09108v1
- Date: Sun, 18 Dec 2022 15:28:16 GMT
- Title: A Permutation-Free Kernel Independence Test
- Authors: Shubhanshu Shekhar, Ilmun Kim, Aaditya Ramdas
- Abstract summary: In nonparametric independence testing, we observe i.i.d. data $(X_i,Y_i)_i=1n$, where $X in mathcalX, Y in mathcalY$ lie in any general spaces.
Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of the underlying U-statistics.
This paper provides a simple but nontrivial modification of HSIC
- Score: 36.50719125230106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In nonparametric independence testing, we observe i.i.d.\ data
$\{(X_i,Y_i)\}_{i=1}^n$, where $X \in \mathcal{X}, Y \in \mathcal{Y}$ lie in
any general spaces, and we wish to test the null that $X$ is independent of
$Y$. Modern test statistics such as the kernel Hilbert-Schmidt Independence
Criterion (HSIC) and Distance Covariance (dCov) have intractable null
distributions due to the degeneracy of the underlying U-statistics. Thus, in
practice, one often resorts to using permutation testing, which provides a
nonasymptotic guarantee at the expense of recalculating the quadratic-time
statistics (say) a few hundred times. This paper provides a simple but
nontrivial modification of HSIC and dCov (called xHSIC and xdCov, pronounced
``cross'' HSIC/dCov) so that they have a limiting Gaussian distribution under
the null, and thus do not require permutations. This requires building on the
newly developed theory of cross U-statistics by Kim and Ramdas (2020), and in
particular developing several nontrivial extensions of the theory in Shekhar et
al. (2022), which developed an analogous permutation-free kernel two-sample
test. We show that our new tests, like the originals, are consistent against
fixed alternatives, and minimax rate optimal against smooth local alternatives.
Numerical simulations demonstrate that compared to the full dCov or HSIC, our
variants have the same power up to a $\sqrt 2$ factor, giving practitioners a
new option for large problems or data-analysis pipelines where computation, not
sample size, could be the bottleneck.
Related papers
- The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels [0.0]
We prove that the minimax optimal rate of HSIC estimation on $mathbb Rd$ for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is $mathcal O!left(n-1/2right)$.
arXiv Detail & Related papers (2024-03-12T15:13:21Z) - Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z) - Differentially Private Conditional Independence Testing [35.376975903797444]
Conditional independence (CI) tests are widely used in statistical data analysis.
In this work, we investigate conditional independence testing under the constraint of differential privacy.
arXiv Detail & Related papers (2023-06-11T16:46:00Z) - Sharp Constants in Uniformity Testing via the Huber Statistic [16.384142529375435]
Uniformity testing is one of the most well-studied problems in property testing.
It is known that the optimal sample complexity to distinguish the uniform distribution on $m$ elements from any $epsilon$-far distribution with $1-delta probability is $n.
We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs.
arXiv Detail & Related papers (2022-06-21T20:43:53Z) - Exact Paired-Permutation Testing for Structured Test Statistics [67.71280539312536]
We provide an efficient exact algorithm for the paired-permutation test for a family of structured test statistics.
Our exact algorithm was $10$x faster than the Monte Carlo approximation with $20000$ samples on a common dataset.
arXiv Detail & Related papers (2022-05-03T11:00:59Z) - A Kernel Test for Causal Association via Noise Contrastive Backdoor Adjustment [18.791409397894835]
We develop a non-parametric method to test the textitdo-null hypothesis $H_0:; p(y|textit do(X=x))=p(y)$ against the general alternative.
We demonstrate that backdoor-HSIC (bd-HSIC) is calibrated and has power for both binary and continuous treatments under a large number of confounders.
arXiv Detail & Related papers (2021-11-25T19:12:37Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Dimension-agnostic inference using cross U-statistics [33.17951971728784]
We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization.
The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks.
arXiv Detail & Related papers (2020-11-10T12:21:34Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.