The leave-one-covariate-out conditional randomization test
- URL: http://arxiv.org/abs/2006.08482v2
- Date: Mon, 13 Jul 2020 14:34:28 GMT
- Title: The leave-one-covariate-out conditional randomization test
- Authors: Eugene Katsevich and Aaditya Ramdas
- Abstract summary: Conditional independence testing is an important problem, yet provably hard without assumptions.
Knockoffs is a popular methodology associated with this framework, but it suffers from two main drawbacks.
The conditional randomization test (CRT) is thought to be the "right" solution under model-X, but usually viewed as computationally inefficient.
- Score: 36.9351790405311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional independence testing is an important problem, yet provably hard
without assumptions. One of the assumptions that has become popular of late is
called "model-X", where we assume we know the joint distribution of the
covariates, but assume nothing about the conditional distribution of the
outcome given the covariates. Knockoffs is a popular methodology associated
with this framework, but it suffers from two main drawbacks: only one-bit
$p$-values are available for inference on each variable, and the method is
randomized with significant variability across runs in practice. The
conditional randomization test (CRT) is thought to be the "right" solution
under model-X, but usually viewed as computationally inefficient. This paper
proposes a computationally efficient leave-one-covariate-out (LOCO) CRT that
addresses both drawbacks of knockoffs. LOCO CRT produces valid $p$-values that
can be used to control the familywise error rate, and has nearly zero
algorithmic variability. For L1 regularized M-estimators, we develop an even
faster variant called L1ME CRT, which reuses computation by leveraging a novel
observation about the stability of the cross-validated lasso to removing
inactive variables. Last, for multivariate Gaussian covariates, we present a
closed form expression for the LOCO CRT $p$-value, thus completely eliminating
resampling in this important special case.
Related papers
- Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Wasserstein F-tests for Fréchet regression on Bures-Wasserstein manifolds [0.9514940899499753]
Fr'echet regression on the Bures-Wasserstein manifold is developed.
A test for the null hypothesis of no association is proposed.
Results show that the proposed test has the desired level of significanceally.
arXiv Detail & Related papers (2024-04-05T04:01:51Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Nearest-Neighbor Sampling Based Conditional Independence Testing [15.478671471695794]
Conditional randomization test (CRT) was recently proposed to test whether two random variables X and Y are conditionally independent given random variables Z.
The aim of this paper is to develop a novel alternative of CRT by using nearest-neighbor sampling without assuming the exact form of the distribution of X given Z.
arXiv Detail & Related papers (2023-04-09T07:54:36Z) - The Projected Covariance Measure for assumption-lean variable significance testing [3.8936058127056357]
A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero.
We study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$.
We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests.
arXiv Detail & Related papers (2022-11-03T17:55:50Z) - DIET: Conditional independence testing with marginal dependence measures
of residual information [30.99595500331328]
Conditional randomization tests (CRTs) assess whether a variable $x$ is predictive of another variable $y$.
Existing solutions to reduce the cost of CRTs typically split the dataset into a train and a test portion.
We propose the decoupled independence test (DIET), an algorithm that avoids both of these issues.
arXiv Detail & Related papers (2022-08-18T00:48:04Z) - A Conditional Randomization Test for Sparse Logistic Regression in
High-Dimension [36.00360315353985]
emphCRT-logit is an algorithm that combines a variable-distillation step and a decorrelation step.
We provide a theoretical analysis of this procedure, and demonstrate its effectiveness on simulations, along with experiments on large-scale brain-imaging and genomics datasets.
arXiv Detail & Related papers (2022-05-29T09:37:16Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - FANOK: Knockoffs in Linear Time [73.5154025911318]
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems.
We test our methods on problems with $p$ as large as $500,000$.
arXiv Detail & Related papers (2020-06-15T21:55:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.