The Chi-Square Test of Distance Correlation
- URL: http://arxiv.org/abs/1912.12150v5
- Date: Fri, 14 May 2021 18:09:51 GMT
- Title: The Chi-Square Test of Distance Correlation
- Authors: Cencheng Shen, Sambit Panda, Joshua T. Vogelstein
- Abstract summary: chi-square test is non-parametric, extremely fast, and applicable to bias-corrected distance correlation using any strong negative type metric or characteristic kernel.
We show that the underlying chi-square distribution well approximates and dominates the limiting null distribution in upper tail, prove the chi-square test can be valid and consistent for testing independence.
- Score: 7.748852202364896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distance correlation has gained much recent attention in the data science
community: the sample statistic is straightforward to compute and
asymptotically equals zero if and only if independence, making it an ideal
choice to discover any type of dependency structure given sufficient sample
size. One major bottleneck is the testing process: because the null
distribution of distance correlation depends on the underlying random variables
and metric choice, it typically requires a permutation test to estimate the
null and compute the p-value, which is very costly for large amount of data. To
overcome the difficulty, in this paper we propose a chi-square test for
distance correlation. Method-wise, the chi-square test is non-parametric,
extremely fast, and applicable to bias-corrected distance correlation using any
strong negative type metric or characteristic kernel. The test exhibits a
similar testing power as the standard permutation test, and can be utilized for
K-sample and partial testing. Theory-wise, we show that the underlying
chi-square distribution well approximates and dominates the limiting null
distribution in upper tail, prove the chi-square test can be valid and
universally consistent for testing independence, and establish a testing power
inequality with respect to the permutation test.
Related papers
- Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Universal Inference Meets Random Projections: A Scalable Test for Log-concavity [30.073886309373226]
We present the first test of log-concavity that is provably valid in finite samples in any dimension.
We find that a random projections approach that converts the d-dimensional testing problem into many one-dimensional problems can yield high power.
arXiv Detail & Related papers (2021-11-17T17:34:44Z) - An $\ell^p$-based Kernel Conditional Independence Test [21.689461247198388]
We propose a new computationally efficient test for conditional independence based on the $Lp$ distance between two kernel-based representatives of well suited distributions.
We conduct a series of experiments showing that the performance of our new tests outperforms state-of-the-art methods both in term of statistical power and type-I error even in the high dimensional setting.
arXiv Detail & Related papers (2021-10-28T03:18:27Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z) - High-Dimensional Independence Testing via Maximum and Average Distance
Correlations [5.756296617325109]
We characterize consistency properties in high-dimensional settings with respect to the number of marginally dependent dimensions.
We examine the advantages of each test statistic, examine their respective null distributions, and present a fast chi-square-based testing procedure.
arXiv Detail & Related papers (2020-01-04T16:21:50Z) - Asymptotic Validity and Finite-Sample Properties of Approximate Randomization Tests [2.28438857884398]
Our key theoretical contribution is a non-asymptotic bound on the discrepancy between the size of an approximate randomization test and the size of the original randomization test using noiseless data.
We illustrate our theory through several examples, including tests of significance in linear regression.
arXiv Detail & Related papers (2019-08-12T16:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.