DIET: Conditional independence testing with marginal dependence measures
of residual information
- URL: http://arxiv.org/abs/2208.08579v2
- Date: Tue, 11 Apr 2023 06:48:52 GMT
- Title: DIET: Conditional independence testing with marginal dependence measures
of residual information
- Authors: Mukund Sudarshan, Aahlad Manas Puli, Wesley Tansey, Rajesh Ranganath
- Abstract summary: Conditional randomization tests (CRTs) assess whether a variable $x$ is predictive of another variable $y$.
Existing solutions to reduce the cost of CRTs typically split the dataset into a train and a test portion.
We propose the decoupled independence test (DIET), an algorithm that avoids both of these issues.
- Score: 30.99595500331328
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Conditional randomization tests (CRTs) assess whether a variable $x$ is
predictive of another variable $y$, having observed covariates $z$. CRTs
require fitting a large number of predictive models, which is often
computationally intractable. Existing solutions to reduce the cost of CRTs
typically split the dataset into a train and test portion, or rely on
heuristics for interactions, both of which lead to a loss in power. We propose
the decoupled independence test (DIET), an algorithm that avoids both of these
issues by leveraging marginal independence statistics to test conditional
independence relationships. DIET tests the marginal independence of two random
variables: $F(x \mid z)$ and $F(y \mid z)$ where $F(\cdot \mid z)$ is a
conditional cumulative distribution function (CDF). These variables are termed
"information residuals." We give sufficient conditions for DIET to achieve
finite sample type-1 error control and power greater than the type-1 error
rate. We then prove that when using the mutual information between the
information residuals as a test statistic, DIET yields the most powerful
conditionally valid test. Finally, we show DIET achieves higher power than
other tractable CRTs on several synthetic and real benchmarks.
Related papers
- Doubly Robust Conditional Independence Testing with Generative Neural Networks [8.323172773256449]
This article addresses the problem of testing the conditional independence of two generic random vectors $X$ and $Y$ given a third random vector $Z$.
We propose a new non-parametric testing procedure that avoids explicitly estimating any conditional distributions.
arXiv Detail & Related papers (2024-07-25T01:28:59Z) - Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z) - Nearest-Neighbor Sampling Based Conditional Independence Testing [15.478671471695794]
Conditional randomization test (CRT) was recently proposed to test whether two random variables X and Y are conditionally independent given random variables Z.
The aim of this paper is to develop a novel alternative of CRT by using nearest-neighbor sampling without assuming the exact form of the distribution of X given Z.
arXiv Detail & Related papers (2023-04-09T07:54:36Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - The Projected Covariance Measure for assumption-lean variable significance testing [3.8936058127056357]
A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero.
We study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$.
We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests.
arXiv Detail & Related papers (2022-11-03T17:55:50Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - The leave-one-covariate-out conditional randomization test [36.9351790405311]
Conditional independence testing is an important problem, yet provably hard without assumptions.
Knockoffs is a popular methodology associated with this framework, but it suffers from two main drawbacks.
The conditional randomization test (CRT) is thought to be the "right" solution under model-X, but usually viewed as computationally inefficient.
arXiv Detail & Related papers (2020-06-15T15:38:24Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z) - Optimal rates for independence testing via $U$-statistic permutation
tests [7.090165638014331]
We study the problem of independence testing given independent and identically distributed pairs taking values in a $sigma$-finite, separable measure space.
We first show that there is no valid test of independence that is uniformly consistent against alternatives of the form $f: D(f) geq rho2 $.
arXiv Detail & Related papers (2020-01-15T19:04:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.