The Terminating-Random Experiments Selector: Fast High-Dimensional
Variable Selection with False Discovery Rate Control
- URL: http://arxiv.org/abs/2110.06048v7
- Date: Tue, 12 Mar 2024 19:50:35 GMT
- Title: The Terminating-Random Experiments Selector: Fast High-Dimensional
Variable Selection with False Discovery Rate Control
- Authors: Jasin Machkour, Michael Muma, Daniel P. Palomar
- Abstract summary: T-Rex selector controls a user-defined target false discovery rate (FDR)
Experiments are conducted on a combination of the original predictors and multiple sets of randomly generated dummy predictors.
- Score: 10.86851797584794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose the Terminating-Random Experiments (T-Rex) selector, a fast
variable selection method for high-dimensional data. The T-Rex selector
controls a user-defined target false discovery rate (FDR) while maximizing the
number of selected variables. This is achieved by fusing the solutions of
multiple early terminated random experiments. The experiments are conducted on
a combination of the original predictors and multiple sets of randomly
generated dummy predictors. A finite sample proof based on martingale theory
for the FDR control property is provided. Numerical simulations confirm that
the FDR is controlled at the target level while allowing for high power. We
prove that the dummies can be sampled from any univariate probability
distribution with finite expectation and variance. The computational complexity
of the proposed method is linear in the number of variables. The T-Rex selector
outperforms state-of-the-art methods for FDR control in numerical experiments
and on a simulated genome-wide association study (GWAS), while its sequential
computation time is more than two orders of magnitude lower than that of the
strongest benchmark methods. The open source R package TRexSelector containing
the implementation of the T-Rex selector is available on CRAN.
Related papers
- The Informed Elastic Net for Fast Grouped Variable Selection and FDR Control in Genomics Research [9.6703621796624]
We propose a new base selector that significantly reduces computation time while retaining the grouped variable selection property.
The proposed T-Rex+GVS (IEN) exhibits the desired grouping effect, reduces time, and achieves the same TPR as T-Rex+GVS (EN) but with lower FDR.
arXiv Detail & Related papers (2024-10-07T17:18:25Z) - Solving FDR-Controlled Sparse Regression Problems with Five Million Variables on a Laptop [1.5948860527881505]
T-Rex selector is a new learning framework based on early terminated random experiments with computer-generated dummy variables.
We propose the Big T-Rex, a new implementation of T-Rex that drastically reduces its Random Access Memory (RAM) consumption.
We showcase that the Big T-Rex can efficiently solve FDR-controlled Lasso-type problems with five million variables on a laptop in thirty minutes.
arXiv Detail & Related papers (2024-09-27T18:38:51Z) - High-Dimensional False Discovery Rate Control for Dependent Variables [10.86851797584794]
We propose a dependency-aware T-Rex selector that harnesses the dependency structure among variables.
We prove that our variable penalization mechanism ensures FDR control.
We formulate a fully integrated optimal calibration algorithm that concurrently determines the parameters of the graphical model and the T-Rex framework.
arXiv Detail & Related papers (2024-01-28T22:56:16Z) - Simplex Random Features [53.97976744884616]
We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels.
We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels.
We show consistent gains provided by SimRFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers.
arXiv Detail & Related papers (2023-01-31T18:53:39Z) - Near-optimal multiple testing in Bayesian linear models with
finite-sample FDR control [11.011242089340438]
In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures that control the False Discovery Rate (FDR)
We introduce Model-X procedures that provably control the frequentist FDR from finite samples, even when the model is misspecified.
Our proposed procedure, PoEdCe, incorporates three key ingredients: Posterior Expectation, distilled randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values.
arXiv Detail & Related papers (2022-11-04T22:56:41Z) - Testing randomness of series generated in Bell's experiment [62.997667081978825]
We use a toy fiber optic based setup to generate binary series, and evaluate their level of randomness according to Ville principle.
Series are tested with a battery of standard statistical indicators, Hurst, Kolmogorov complexity, minimum entropy, Takensarity dimension of embedding, and Augmented Dickey Fuller and Kwiatkowski Phillips Schmidt Shin to check station exponent.
The level of randomness of series obtained by applying Toeplitz extractor to rejected series is found to be indistinguishable from the level of non-rejected raw ones.
arXiv Detail & Related papers (2022-08-31T17:39:29Z) - Sequential Permutation Testing of Random Forest Variable Importance
Measures [68.8204255655161]
It is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests.
The results of simulation studies confirm that the theoretical properties of the sequential tests apply.
The numerical stability of the methods is investigated in two additional application studies.
arXiv Detail & Related papers (2022-06-02T20:16:50Z) - Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together.
The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z) - Algorithms for Adaptive Experiments that Trade-off Statistical Analysis
with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments.
We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z) - Directional FDR Control for Sub-Gaussian Sparse GLMs [4.229179009157074]
False discovery rate (FDR) control aims to identify some small number of statistically significantly nonzero results.
We construct the debiased matrix-Lasso estimator and prove the normality by minimax-rate oracle inequalities for sparse GLMs.
arXiv Detail & Related papers (2021-05-02T05:34:32Z) - Uncertainty Inspired RGB-D Saliency Detection [70.50583438784571]
We propose the first framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.
Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection.
Results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps.
arXiv Detail & Related papers (2020-09-07T13:01:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.