The leave-one-covariate-out conditional randomization test
- URL: http://arxiv.org/abs/2006.08482v2
- Date: Mon, 13 Jul 2020 14:34:28 GMT
- Title: The leave-one-covariate-out conditional randomization test
- Authors: Eugene Katsevich and Aaditya Ramdas
- Abstract summary: Conditional independence testing is an important problem, yet provably hard without assumptions.
Knockoffs is a popular methodology associated with this framework, but it suffers from two main drawbacks.
The conditional randomization test (CRT) is thought to be the "right" solution under model-X, but usually viewed as computationally inefficient.
- Score: 36.9351790405311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional independence testing is an important problem, yet provably hard
without assumptions. One of the assumptions that has become popular of late is
called "model-X", where we assume we know the joint distribution of the
covariates, but assume nothing about the conditional distribution of the
outcome given the covariates. Knockoffs is a popular methodology associated
with this framework, but it suffers from two main drawbacks: only one-bit
$p$-values are available for inference on each variable, and the method is
randomized with significant variability across runs in practice. The
conditional randomization test (CRT) is thought to be the "right" solution
under model-X, but usually viewed as computationally inefficient. This paper
proposes a computationally efficient leave-one-covariate-out (LOCO) CRT that
addresses both drawbacks of knockoffs. LOCO CRT produces valid $p$-values that
can be used to control the familywise error rate, and has nearly zero
algorithmic variability. For L1 regularized M-estimators, we develop an even
faster variant called L1ME CRT, which reuses computation by leveraging a novel
observation about the stability of the cross-validated lasso to removing
inactive variables. Last, for multivariate Gaussian covariates, we present a
closed form expression for the LOCO CRT $p$-value, thus completely eliminating
resampling in this important special case.
Related papers
- Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression [102.24287051757469]
We study self-supervised covariance estimation in deep heteroscedastic regression.
We derive an upper bound on the 2-Wasserstein distance between normal distributions.
Experiments over a wide range of synthetic and real datasets demonstrate that the proposed 2-Wasserstein bound coupled with pseudo label annotations results in a computationally cheaper yet accurate deep heteroscedastic regression.
arXiv Detail & Related papers (2025-02-14T22:37:11Z) - Asymptotic FDR Control with Model-X Knockoffs: Is Moments Matching Sufficient? [6.6716279375012295]
We propose a unified theoretical framework for studying the robustness of the model-X knockoffs framework.
For the first time in the literature, our theoretical results justify formally the effectiveness and inference of the Gaussian knockoffs generator.
arXiv Detail & Related papers (2025-02-09T17:36:00Z) - Conditional Diffusion Models Based Conditional Independence Testing [8.34871567507739]
Conditional randomization test (CRT) was recently introduced to test whether two random variables, $X$ and $Y$, are conditionally independent.
We propose using conditional diffusion models (CDMs) to learn the distribution of $X|Z$.
arXiv Detail & Related papers (2024-12-16T13:03:18Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Nearest-Neighbor Sampling Based Conditional Independence Testing [15.478671471695794]
Conditional randomization test (CRT) was recently proposed to test whether two random variables X and Y are conditionally independent given random variables Z.
The aim of this paper is to develop a novel alternative of CRT by using nearest-neighbor sampling without assuming the exact form of the distribution of X given Z.
arXiv Detail & Related papers (2023-04-09T07:54:36Z) - The Projected Covariance Measure for assumption-lean variable significance testing [3.8936058127056357]
A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero.
We study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$.
We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests.
arXiv Detail & Related papers (2022-11-03T17:55:50Z) - DIET: Conditional independence testing with marginal dependence measures
of residual information [30.99595500331328]
Conditional randomization tests (CRTs) assess whether a variable $x$ is predictive of another variable $y$.
Existing solutions to reduce the cost of CRTs typically split the dataset into a train and a test portion.
We propose the decoupled independence test (DIET), an algorithm that avoids both of these issues.
arXiv Detail & Related papers (2022-08-18T00:48:04Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - FANOK: Knockoffs in Linear Time [73.5154025911318]
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems.
We test our methods on problems with $p$ as large as $500,000$.
arXiv Detail & Related papers (2020-06-15T21:55:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.