Testing Independence of Exchangeable Random Variables
- URL: http://arxiv.org/abs/2210.12392v1
- Date: Sat, 22 Oct 2022 08:55:48 GMT
- Title: Testing Independence of Exchangeable Random Variables
- Authors: Marcus Hutter
- Abstract summary: Given well-shuffled data, can we determine whether the data items are statistically (in)dependent?
We will show that this is possible and develop tests that can confidently reject the null hypothesis that data is independent and identically distributed.
One potential application is in Deep Learning, where data is often scraped from the whole internet, with duplications abound.
- Score: 19.973896010415977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given well-shuffled data, can we determine whether the data items are
statistically (in)dependent? Formally, we consider the problem of testing
whether a set of exchangeable random variables are independent. We will show
that this is possible and develop tests that can confidently reject the null
hypothesis that data is independent and identically distributed and have high
power for (some) exchangeable distributions. We will make no structural
assumptions on the underlying sample space. One potential application is in
Deep Learning, where data is often scraped from the whole internet, with
duplications abound, which can render data non-iid and test-set evaluation
prone to give wrong answers.
Related papers
- Federated Causal Discovery from Heterogeneous Data [70.31070224690399]
We propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data.
These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy.
We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method.
arXiv Detail & Related papers (2024-02-20T18:53:53Z) - Testing multivariate normality by testing independence [0.0]
We propose a simple multivariate normality test based on Kac-Bernstein's characterization, which can be conducted by utilising existing statistical independence tests for sums and differences of data samples.
We also perform its empirical investigation, which reveals that for high-dimensional data, the proposed approach may be more efficient than the alternative ones.
arXiv Detail & Related papers (2023-11-20T07:19:52Z) - Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing.
We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Non-Parametric Inference of Relational Dependence [17.76905154531867]
This work examines the problem of estimating independence in data drawn from relational systems.
We propose a consistent, non-parametric, scalable kernel test to operationalize the relational independence test for non-i.i.d. observational data.
arXiv Detail & Related papers (2022-06-30T03:42:20Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Safe Tests and Always-Valid Confidence Intervals for contingency tables
and beyond [69.25055322530058]
We develop E variables for testing whether two data streams come from the same source or not.
These E variables lead to tests that remain safe, under flexible sampling scenarios such as optional stopping and continuation.
arXiv Detail & Related papers (2021-06-04T20:12:13Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - Asymptotic Validity and Finite-Sample Properties of Approximate Randomization Tests [2.28438857884398]
Our key theoretical contribution is a non-asymptotic bound on the discrepancy between the size of an approximate randomization test and the size of the original randomization test using noiseless data.
We illustrate our theory through several examples, including tests of significance in linear regression.
arXiv Detail & Related papers (2019-08-12T16:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.