Near-optimal multiple testing in Bayesian linear models with
finite-sample FDR control
- URL: http://arxiv.org/abs/2211.02778v3
- Date: Fri, 21 Jul 2023 22:16:10 GMT
- Title: Near-optimal multiple testing in Bayesian linear models with
finite-sample FDR control
- Authors: Taejoo Ahn, Licong Lin, Song Mei
- Abstract summary: In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures that control the False Discovery Rate (FDR)
We introduce Model-X procedures that provably control the frequentist FDR from finite samples, even when the model is misspecified.
Our proposed procedure, PoEdCe, incorporates three key ingredients: Posterior Expectation, distilled randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values.
- Score: 11.011242089340438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In high dimensional variable selection problems, statisticians often seek to
design multiple testing procedures that control the False Discovery Rate (FDR),
while concurrently identifying a greater number of relevant variables. Model-X
methods, such as Knockoffs and conditional randomization tests, achieve the
primary goal of finite-sample FDR control, assuming a known distribution of
covariates. However, whether these methods can also achieve the secondary goal
of maximizing discoveries remains uncertain. In fact, designing procedures to
discover more relevant variables with finite-sample FDR control is a largely
open question, even within the arguably simplest linear models.
In this paper, we develop near-optimal multiple testing procedures for high
dimensional Bayesian linear models with isotropic covariates. We introduce
Model-X procedures that provably control the frequentist FDR from finite
samples, even when the model is misspecified, and conjecturally achieve
near-optimal power when the data follow the Bayesian linear model. Our proposed
procedure, PoEdCe, incorporates three key ingredients: Posterior Expectation,
distilled Conditional randomization test (dCRT), and the Benjamini-Hochberg
procedure with e-values (eBH). The optimality conjecture of PoEdCe is based on
a heuristic calculation of its asymptotic true positive proportion (TPP) and
false discovery proportion (FDP), which is supported by methods from
statistical physics as well as extensive numerical simulations. Our result
establishes the Bayesian linear model as a benchmark for comparing the power of
various multiple testing procedures.
Related papers
- Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Learning Multivariate CDFs and Copulas using Tensor Factorization [39.24470798045442]
Learning the multivariate distribution of data is a core challenge in statistics and machine learning.
In this work, we aim to learn multivariate cumulative distribution functions (CDFs), as they can handle mixed random variables.
We show that any grid sampled version of a joint CDF of mixed random variables admits a universal representation as a naive Bayes model.
We demonstrate the superior performance of the proposed model in several synthetic and real datasets and applications including regression, sampling and data imputation.
arXiv Detail & Related papers (2022-10-13T16:18:46Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Two-stage Hypothesis Tests for Variable Interactions with FDR Control [10.750902543185802]
We propose a two-stage testing procedure with false discovery rate (FDR) control, which is known as a less conservative multiple-testing correction.
We demonstrate via comprehensive simulation studies that our two-stage procedure is more efficient than the classical BH procedure, with a comparable or improved statistical power.
arXiv Detail & Related papers (2022-08-31T19:17:00Z) - Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together.
The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - The Terminating-Random Experiments Selector: Fast High-Dimensional
Variable Selection with False Discovery Rate Control [10.86851797584794]
T-Rex selector controls a user-defined target false discovery rate (FDR)
Experiments are conducted on a combination of the original predictors and multiple sets of randomly generated dummy predictors.
arXiv Detail & Related papers (2021-10-12T14:52:46Z) - AdaPT-GMM: Powerful and robust covariate-assisted multiple testing [0.7614628596146599]
We propose a new empirical Bayes method for co-assisted multiple testing with false discovery rate (FDR) control.
Our method refines the adaptive p-value thresholding (AdaPT) procedure by generalizing its masking scheme.
We show in extensive simulations and real data examples that our new method, which we call AdaPT-GMM, consistently delivers high power.
arXiv Detail & Related papers (2021-06-30T05:06:18Z) - Directional FDR Control for Sub-Gaussian Sparse GLMs [4.229179009157074]
False discovery rate (FDR) control aims to identify some small number of statistically significantly nonzero results.
We construct the debiased matrix-Lasso estimator and prove the normality by minimax-rate oracle inequalities for sparse GLMs.
arXiv Detail & Related papers (2021-05-02T05:34:32Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - Lower bounds in multiple testing: A framework based on derandomized
proxies [107.69746750639584]
This paper introduces an analysis strategy based on derandomization, illustrated by applications to various concrete models.
We provide numerical simulations of some of these lower bounds, and show a close relation to the actual performance of the Benjamini-Hochberg (BH) algorithm.
arXiv Detail & Related papers (2020-05-07T19:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.