Variable Selection with the Knockoffs: Composite Null Hypotheses
- URL: http://arxiv.org/abs/2203.02849v4
- Date: Mon, 27 Nov 2023 06:42:31 GMT
- Title: Variable Selection with the Knockoffs: Composite Null Hypotheses
- Authors: Mehrdad Pournaderi and Yu Xiang
- Abstract summary: We extend the theory of the knockoff procedure to tests with composite null hypotheses.
The main technical challenge lies in handling composite nulls in tandem with dependent features from arbitrary designs.
- Score: 2.725698729450241
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The fixed-X knockoff filter is a flexible framework for variable selection
with false discovery rate (FDR) control in linear models with arbitrary design
matrices (of full column rank) and it allows for finite-sample selective
inference via the Lasso estimates. In this paper, we extend the theory of the
knockoff procedure to tests with composite null hypotheses, which are usually
more relevant to real-world problems. The main technical challenge lies in
handling composite nulls in tandem with dependent features from arbitrary
designs. We develop two methods for composite inference with the knockoffs,
namely, shifted ordinary least-squares (S-OLS) and feature-response product
perturbation (FRPP), building on new structural properties of test statistics
under composite nulls. We also propose two heuristic variants of S-OLS method
that outperform the celebrated Benjamini-Hochberg (BH) procedure for composite
nulls, which serves as a heuristic baseline under dependent test statistics.
Finally, we analyze the loss in FDR when the original knockoff procedure is
naively applied on composite tests.
Related papers
- Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing.
We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z) - Auto-Encoding Goodness of Fit [11.543670549371361]
We develop the Goodness of Fit Autoencoder (GoFAE), which incorporates hypothesis tests at two levels.
GoFAE achieves comparable FID scores and mean squared errors with competing deep generative models.
arXiv Detail & Related papers (2022-10-12T19:21:57Z) - Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together.
The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z) - AdaPT-GMM: Powerful and robust covariate-assisted multiple testing [0.7614628596146599]
We propose a new empirical Bayes method for co-assisted multiple testing with false discovery rate (FDR) control.
Our method refines the adaptive p-value thresholding (AdaPT) procedure by generalizing its masking scheme.
We show in extensive simulations and real data examples that our new method, which we call AdaPT-GMM, consistently delivers high power.
arXiv Detail & Related papers (2021-06-30T05:06:18Z) - LSDAT: Low-Rank and Sparse Decomposition for Decision-based Adversarial
Attack [74.5144793386864]
LSDAT crafts perturbations in the low-dimensional subspace formed by the sparse component of the input sample and that of an adversarial sample.
LSD works directly in the image pixel domain to guarantee that non-$ell$ constraints, such as sparsity, are satisfied.
arXiv Detail & Related papers (2021-03-19T13:10:47Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - Lower bounds in multiple testing: A framework based on derandomized
proxies [107.69746750639584]
This paper introduces an analysis strategy based on derandomization, illustrated by applications to various concrete models.
We provide numerical simulations of some of these lower bounds, and show a close relation to the actual performance of the Benjamini-Hochberg (BH) algorithm.
arXiv Detail & Related papers (2020-05-07T19:59:51Z) - Fundamental Limits of Testing the Independence of Irrelevant
Alternatives in Discrete Choice [9.13127392774573]
The Multinomial Logit (MNL) model and the Independence of Irrelevant Alternatives (IIA) are the most widely used tools of discrete choice.
We show that any general test for IIA with low worst-case error would require a number of samples exponential in the number of alternatives of the choice problem.
Our lower bounds are structure-dependent, and as a potential cause for optimism, we find that if one restricts the test of IIA to violations that can occur in a specific collection of choice sets, one obtains structure-dependent lower bounds that are much less pessimistic.
arXiv Detail & Related papers (2020-01-20T10:15:28Z) - Safe Testing [0.9634859579172255]
We develop the theory of hypothesis testing based on the e-value.
Tests based on e-values are safe, i.e. they preserve Type-I error guarantees.
arXiv Detail & Related papers (2019-06-18T20:39:27Z) - Naive Feature Selection: a Nearly Tight Convex Relaxation for Sparse Naive Bayes [51.55826927508311]
We propose a sparse version of naive Bayes, which can be used for feature selection.
We prove that our convex relaxation bounds becomes tight as the marginal contribution of additional features decreases.
Both binary and multinomial sparse models are solvable in time almost linear in problem size.
arXiv Detail & Related papers (2019-05-23T19:30:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.