High-Dimensional False Discovery Rate Control for Dependent Variables
- URL: http://arxiv.org/abs/2401.15796v2
- Date: Tue, 30 Jan 2024 17:52:47 GMT
- Title: High-Dimensional False Discovery Rate Control for Dependent Variables
- Authors: Jasin Machkour, Michael Muma, Daniel P. Palomar
- Abstract summary: We propose a dependency-aware T-Rex selector that harnesses the dependency structure among variables.
We prove that our variable penalization mechanism ensures FDR control.
We formulate a fully integrated optimal calibration algorithm that concurrently determines the parameters of the graphical model and the T-Rex framework.
- Score: 10.86851797584794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Algorithms that ensure reproducible findings from large-scale,
high-dimensional data are pivotal in numerous signal processing applications.
In recent years, multivariate false discovery rate (FDR) controlling methods
have emerged, providing guarantees even in high-dimensional settings where the
number of variables surpasses the number of samples. However, these methods
often fail to reliably control the FDR in the presence of highly dependent
variable groups, a common characteristic in fields such as genomics and
finance. To tackle this critical issue, we introduce a novel framework that
accounts for general dependency structures. Our proposed dependency-aware T-Rex
selector integrates hierarchical graphical models within the T-Rex framework to
effectively harness the dependency structure among variables. Leveraging
martingale theory, we prove that our variable penalization mechanism ensures
FDR control. We further generalize the FDR-controlling framework by stating and
proving a clear condition necessary for designing both graphical and
non-graphical models that capture dependencies. Additionally, we formulate a
fully integrated optimal calibration algorithm that concurrently determines the
parameters of the graphical model and the T-Rex framework, such that the FDR is
controlled while maximizing the number of selected variables. Numerical
experiments and a breast cancer survival analysis use-case demonstrate that the
proposed method is the only one among the state-of-the-art benchmark methods
that controls the FDR and reliably detects genes that have been previously
identified to be related to breast cancer. An open-source implementation is
available within the R package TRexSelector on CRAN.
Related papers
- FDR-Controlled Portfolio Optimization for Sparse Financial Index
Tracking [10.86851797584794]
In high-dimensional data analysis, it is crucial to select the few relevant variables while maintaining control over the false discovery rate (FDR)
We have expanded the T-Rex framework to accommodate overlapping groups of highly correlated variables.
This is achieved by integrating a nearest neighbors penalization mechanism into the framework, which provably controls the FDR at the user-defined target level.
arXiv Detail & Related papers (2024-01-26T18:29:30Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Near-optimal multiple testing in Bayesian linear models with
finite-sample FDR control [11.011242089340438]
In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures that control the False Discovery Rate (FDR)
We introduce Model-X procedures that provably control the frequentist FDR from finite samples, even when the model is misspecified.
Our proposed procedure, PoEdCe, incorporates three key ingredients: Posterior Expectation, distilled randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values.
arXiv Detail & Related papers (2022-11-04T22:56:41Z) - Probabilistic Model Incorporating Auxiliary Covariates to Control FDR [6.270317798744481]
Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science.
We propose a deep Black-Box framework controlling FDR (named as NeurT-FDR) which boosts statistical power and controls FDR for multiple-hypothesis testing.
We show that NeurT-FDR makes substantially more discoveries in three real datasets compared to competitive baselines.
arXiv Detail & Related papers (2022-10-06T19:35:53Z) - Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together.
The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z) - BCDAG: An R package for Bayesian structure and Causal learning of
Gaussian DAGs [77.34726150561087]
We introduce the R package for causal discovery and causal effect estimation from observational data.
Our implementation scales efficiently with the number of observations and, whenever the DAGs are sufficiently sparse, the number of variables in the dataset.
We then illustrate the main functions and algorithms on both real and simulated datasets.
arXiv Detail & Related papers (2022-01-28T09:30:32Z) - Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations [50.37808220291108]
This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations.
We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety.
We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior.
arXiv Detail & Related papers (2021-11-18T23:21:00Z) - The Terminating-Random Experiments Selector: Fast High-Dimensional
Variable Selection with False Discovery Rate Control [10.86851797584794]
T-Rex selector controls a user-defined target false discovery rate (FDR)
Experiments are conducted on a combination of the original predictors and multiple sets of randomly generated dummy predictors.
arXiv Detail & Related papers (2021-10-12T14:52:46Z) - NeurT-FDR: Controlling FDR by Incorporating Feature Hierarchy [7.496622386458525]
We propose NeurT-FDR which boosts statistical power and controls FDR for multiple hypothesis testing.
We show that NeurT-FDR has strong FDR guarantees and makes substantially more discoveries in synthetic and real datasets.
arXiv Detail & Related papers (2021-01-24T21:55:10Z) - Lower bounds in multiple testing: A framework based on derandomized
proxies [107.69746750639584]
This paper introduces an analysis strategy based on derandomization, illustrated by applications to various concrete models.
We provide numerical simulations of some of these lower bounds, and show a close relation to the actual performance of the Benjamini-Hochberg (BH) algorithm.
arXiv Detail & Related papers (2020-05-07T19:59:51Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.