Asymptotic FDR Control with Model-X Knockoffs: Is Moments Matching Sufficient?
- URL: http://arxiv.org/abs/2502.05969v1
- Date: Sun, 09 Feb 2025 17:36:00 GMT
- Title: Asymptotic FDR Control with Model-X Knockoffs: Is Moments Matching Sufficient?
- Authors: Yingying Fan, Lan Gao, Jinchi Lv, Xiaocong Xu,
- Abstract summary: We propose a unified theoretical framework for studying the robustness of the model-X knockoffs framework.
For the first time in the literature, our theoretical results justify formally the effectiveness and inference of the Gaussian knockoffs generator.
- Score: 6.6716279375012295
- License:
- Abstract: We propose a unified theoretical framework for studying the robustness of the model-X knockoffs framework by investigating the asymptotic false discovery rate (FDR) control of the practically implemented approximate knockoffs procedure. This procedure deviates from the model-X knockoffs framework by substituting the true covariate distribution with a user-specified distribution that can be learned using in-sample observations. By replacing the distributional exchangeability condition of the model-X knockoff variables with three conditions on the approximate knockoff statistics, we establish that the approximate knockoffs procedure achieves the asymptotic FDR control. Using our unified framework, we further prove that an arguably most popularly used knockoff variable generation method--the Gaussian knockoffs generator based on the first two moments matching--achieves the asymptotic FDR control when the two-moment-based knockoff statistics are employed in the knockoffs inference procedure. For the first time in the literature, our theoretical results justify formally the effectiveness and robustness of the Gaussian knockoffs generator. Simulation and real data examples are conducted to validate the theoretical findings.
Related papers
- Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution.
We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory.
That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - ARK: Robust Knockoffs Inference with Coupling [7.288274235236948]
We study the robustness of the model-X knockoffs framework with respect to the misspecified or estimated feature distribution.
A key technique is to couple the approximate knockoffs procedure with the model-X knockoffs procedure so that random variables in these two procedures can be close in realizations.
We prove that if such a coupled model-X knockoffs procedure exists, the approximate knockoffs procedure can achieve the FDR or $k$-FWER control at the target level.
arXiv Detail & Related papers (2023-07-10T08:01:59Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Near-optimal multiple testing in Bayesian linear models with
finite-sample FDR control [11.011242089340438]
In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures that control the False Discovery Rate (FDR)
We introduce Model-X procedures that provably control the frequentist FDR from finite samples, even when the model is misspecified.
Our proposed procedure, PoEdCe, incorporates three key ingredients: Posterior Expectation, distilled randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values.
arXiv Detail & Related papers (2022-11-04T22:56:41Z) - Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together.
The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z) - Learning generative models for valid knockoffs using novel
multivariate-rank based statistics [12.528602250193206]
Rank energy (RE) is derived using theoretical results characterizing the optimal maps in the Monge's Optimal Transport (OT) problem.
We propose a variant of the RE, dubbed as soft rank energy (sRE), and its kernel variant called as soft rank maximum mean discrepancy (sRMMD)
We then use sRMMD to generate deep knockoffs and show via extensive evaluation that it is a novel and effective method to produce valid knockoffs.
arXiv Detail & Related papers (2021-10-29T18:51:19Z) - Random quantum circuits anti-concentrate in log depth [118.18170052022323]
We study the number of gates needed for the distribution over measurement outcomes for typical circuit instances to be anti-concentrated.
Our definition of anti-concentration is that the expected collision probability is only a constant factor larger than if the distribution were uniform.
In both the case where the gates are nearest-neighbor on a 1D ring and the case where gates are long-range, we show $O(n log(n)) gates are also sufficient.
arXiv Detail & Related papers (2020-11-24T18:44:57Z) - The leave-one-covariate-out conditional randomization test [36.9351790405311]
Conditional independence testing is an important problem, yet provably hard without assumptions.
Knockoffs is a popular methodology associated with this framework, but it suffers from two main drawbacks.
The conditional randomization test (CRT) is thought to be the "right" solution under model-X, but usually viewed as computationally inefficient.
arXiv Detail & Related papers (2020-06-15T15:38:24Z) - Lower bounds in multiple testing: A framework based on derandomized
proxies [107.69746750639584]
This paper introduces an analysis strategy based on derandomization, illustrated by applications to various concrete models.
We provide numerical simulations of some of these lower bounds, and show a close relation to the actual performance of the Benjamini-Hochberg (BH) algorithm.
arXiv Detail & Related papers (2020-05-07T19:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.