Data-Driven Representations for Testing Independence: Modeling, Analysis
and Connection with Mutual Information Estimation
- URL: http://arxiv.org/abs/2110.14122v1
- Date: Wed, 27 Oct 2021 02:06:05 GMT
- Title: Data-Driven Representations for Testing Independence: Modeling, Analysis
and Connection with Mutual Information Estimation
- Authors: Mauricio E. Gonzalez, Jorge F. Silva, Miguel Videla, and Marcos E.
Orchard
- Abstract summary: This work addresses testing the independence of two continuous and finite-dimensional random variables from the design of a data-driven partition.
It is shown that approximating the sufficient statistics of an oracle test offers a learning criterion for designing a data-driven partition.
Some experimental analyses provide evidence regarding our scheme's advantage for testing independence compared with some strategies that do not use data-driven representations.
- Score: 3.9023554886892433
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This work addresses testing the independence of two continuous and
finite-dimensional random variables from the design of a data-driven partition.
The empirical log-likelihood statistic is adopted to approximate the sufficient
statistics of an oracle test against independence (that knows the two
hypotheses). It is shown that approximating the sufficient statistics of the
oracle test offers a learning criterion for designing a data-driven partition
that connects with the problem of mutual information estimation. Applying these
ideas in the context of a data-dependent tree-structured partition (TSP), we
derive conditions on the TSP's parameters to achieve a strongly consistent
distribution-free test of independence over the family of probabilities
equipped with a density. Complementing this result, we present finite-length
results that show our TSP scheme's capacity to detect the scenario of
independence structurally with the data-driven partition as well as new
sampling complexity bounds for this detection. Finally, some experimental
analyses provide evidence regarding our scheme's advantage for testing
independence compared with some strategies that do not use data-driven
representations.
Related papers
- Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing.
We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z) - Breaking the Spurious Causality of Conditional Generation via Fairness
Intervention with Corrective Sampling [77.15766509677348]
Conditional generative models often inherit spurious correlations from the training dataset.
This can result in label-conditional distributions that are imbalanced with respect to another latent attribute.
We propose a general two-step strategy to mitigate this issue.
arXiv Detail & Related papers (2022-12-05T08:09:33Z) - Conditional Independence Testing via Latent Representation Learning [2.566492438263125]
LCIT (Latent representation based Conditional Independence Test) is a novel non-parametric method for conditional independence testing based on representation learning.
Our main contribution involves proposing a generative framework in which to test for the independence between X and Y given Z.
arXiv Detail & Related papers (2022-09-04T07:16:03Z) - Private independence testing across two parties [21.236868468146348]
$pi$-test is a privacy-preserving algorithm for testing statistical independence between data distributed across multiple parties.
We establish both additive and multiplicative error bounds on the utility of our differentially private test.
arXiv Detail & Related papers (2022-07-08T02:13:05Z) - Non-Parametric Inference of Relational Dependence [17.76905154531867]
This work examines the problem of estimating independence in data drawn from relational systems.
We propose a consistent, non-parametric, scalable kernel test to operationalize the relational independence test for non-i.i.d. observational data.
arXiv Detail & Related papers (2022-06-30T03:42:20Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - The UU-test for Statistical Modeling of Unimodal Data [0.20305676256390928]
We propose a technique called UU-test (Unimodal Uniform test) to decide on the unimodality of a one-dimensional dataset.
A unique feature of this approach is that in the case of unimodality, it also provides a statistical model of the data in the form of a Uniform Mixture Model.
arXiv Detail & Related papers (2020-08-28T08:34:28Z) - Distribution Approximation and Statistical Estimation Guarantees of
Generative Adversarial Networks [82.61546580149427]
Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning.
This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions with densities in a H"older space.
arXiv Detail & Related papers (2020-02-10T16:47:57Z) - Independence Testing for Temporal Data [14.25244839642841]
A fundamental question is whether two time-series are related or not.
Existing approaches often have limitations, such as relying on parametric assumptions.
This paper introduces the temporal dependence statistic with block permutation to test independence between temporal data.
arXiv Detail & Related papers (2019-08-18T17:19:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.