High-dimensional and universally consistent k-sample tests
- URL: http://arxiv.org/abs/1910.08883v4
- Date: Wed, 11 Oct 2023 17:14:41 GMT
- Title: High-dimensional and universally consistent k-sample tests
- Authors: Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz,
Carey E. Priebe, Joshua T. Vogelstein
- Abstract summary: k-sample testing problem involves determining whether $k$ groups of data points are each drawn from the same distribution.
Independence tests achieve universally consistent k-sample testing.
- Score: 18.327837489069907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The k-sample testing problem involves determining whether $k$ groups of data
points are each drawn from the same distribution. The standard method for
k-sample testing in biomedicine is Multivariate analysis of variance (MANOVA),
despite that it depends on strong, and often unsuitable, parametric
assumptions. Moreover, independence testing and k-sample testing are closely
related, and several universally consistent high-dimensional independence tests
such as distance correlation (Dcorr) and Hilbert-Schmidt-Independence-Criterion
(Hsic) enjoy solid theoretical and empirical properties. In this paper, we
prove that independence tests achieve universally consistent k-sample testing
and that k-sample statistics such as Energy and Maximum Mean Discrepancy (MMD)
are precisely equivalent to Dcorr. An empirical evaluation of nonparametric
independence tests showed that they generally perform better than the popular
MANOVA test, even in Gaussian distributed scenarios. The evaluation included
several popular independence statistics and covered a comprehensive set of
simulations. Additionally, the testing approach was extended to perform
multiway and multilevel tests, which were demonstrated in a simulated study as
well as a real-world fMRI brain scans with a set of attributes.
Related papers
- Testing multivariate normality by testing independence [0.0]
We propose a simple multivariate normality test based on Kac-Bernstein's characterization, which can be conducted by utilising existing statistical independence tests for sums and differences of data samples.
We also perform its empirical investigation, which reveals that for high-dimensional data, the proposed approach may be more efficient than the alternative ones.
arXiv Detail & Related papers (2023-11-20T07:19:52Z) - Conditional Independence Testing with Heteroskedastic Data and
Applications to Causal Discovery [7.493779672689531]
Conditional independence (CI) testing is frequently used in data analysis and machine learning for various scientific fields.
We present an adaptation of the partial correlation CI test that works well in the presence of heteroskedastic noise.
Numerical causal discovery experiments demonstrate that the adapted partial correlation CI test outperforms the standard test in the presence of heteroskedasticity.
arXiv Detail & Related papers (2023-06-20T12:36:38Z) - Detecting Adversarial Data by Probing Multiple Perturbations Using
Expected Perturbation Score [62.54911162109439]
Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions.
We propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations.
We develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples.
arXiv Detail & Related papers (2023-05-25T13:14:58Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Two-Sample Testing on Ranked Preference Data and the Role of Modeling
Assumptions [57.77347280992548]
In this paper, we design two-sample tests for pairwise comparison data and ranking data.
Our test requires essentially no assumptions on the distributions.
By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently.
arXiv Detail & Related papers (2020-06-21T20:51:09Z) - Learning Kernel Tests Without Data Splitting [18.603394415852765]
We propose an approach that enables learning the hyper parameters and testing on the full sample without data splitting.
Our approach's test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.
arXiv Detail & Related papers (2020-06-03T14:07:39Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z) - High-Dimensional Independence Testing via Maximum and Average Distance
Correlations [5.756296617325109]
We characterize consistency properties in high-dimensional settings with respect to the number of marginally dependent dimensions.
We examine the advantages of each test statistic, examine their respective null distributions, and present a fast chi-square-based testing procedure.
arXiv Detail & Related papers (2020-01-04T16:21:50Z) - The Chi-Square Test of Distance Correlation [7.748852202364896]
chi-square test is non-parametric, extremely fast, and applicable to bias-corrected distance correlation using any strong negative type metric or characteristic kernel.
We show that the underlying chi-square distribution well approximates and dominates the limiting null distribution in upper tail, prove the chi-square test can be valid and consistent for testing independence.
arXiv Detail & Related papers (2019-12-27T15:16:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.