Directional FDR Control for Sub-Gaussian Sparse GLMs
- URL: http://arxiv.org/abs/2105.00393v1
- Date: Sun, 2 May 2021 05:34:32 GMT
- Title: Directional FDR Control for Sub-Gaussian Sparse GLMs
- Authors: Chang Cui, Jinzhu Jia, Yijun Xiao, Huiming Zhang
- Abstract summary: False discovery rate (FDR) control aims to identify some small number of statistically significantly nonzero results.
We construct the debiased matrix-Lasso estimator and prove the normality by minimax-rate oracle inequalities for sparse GLMs.
- Score: 4.229179009157074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-dimensional sparse generalized linear models (GLMs) have emerged in the
setting that the number of samples and the dimension of variables are large,
and even the dimension of variables grows faster than the number of samples.
False discovery rate (FDR) control aims to identify some small number of
statistically significantly nonzero results after getting the sparse penalized
estimation of GLMs. Using the CLIME method for precision matrix estimations, we
construct the debiased-Lasso estimator and prove the asymptotical normality by
minimax-rate oracle inequalities for sparse GLMs. In practice, it is often
needed to accurately judge each regression coefficient's positivity and
negativity, which determines whether the predictor variable is positively or
negatively related to the response variable conditionally on the rest
variables. Using the debiased estimator, we establish multiple testing
procedures. Under mild conditions, we show that the proposed debiased
statistics can asymptotically control the directional (sign) FDR and
directional false discovery variables at a pre-specified significance level.
Moreover, it can be shown that our multiple testing procedure can approximately
achieve a statistical power of 1. We also extend our methods to the two-sample
problems and propose the two-sample test statistics. Under suitable conditions,
we can asymptotically achieve directional FDR control and directional FDV
control at the specified significance level for two-sample problems. Some
numerical simulations have successfully verified the FDR control effects of our
proposed testing procedures, which sometimes outperforms the classical knockoff
method.
Related papers
- REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity.
We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z) - On High dimensional Poisson models with measurement error: hypothesis
testing for nonlinear nonconvex optimization [13.369004892264146]
We estimation and testing regression model with high dimensionals, which has wide applications in analyzing data.
We propose to estimate regression parameter through minimizing penalized consistency.
The proposed method is applied to the Alzheimer's Disease Initiative.
arXiv Detail & Related papers (2022-12-31T06:58:42Z) - Near-optimal multiple testing in Bayesian linear models with
finite-sample FDR control [11.011242089340438]
In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures that control the False Discovery Rate (FDR)
We introduce Model-X procedures that provably control the frequentist FDR from finite samples, even when the model is misspecified.
Our proposed procedure, PoEdCe, incorporates three key ingredients: Posterior Expectation, distilled randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values.
arXiv Detail & Related papers (2022-11-04T22:56:41Z) - Two-stage Hypothesis Tests for Variable Interactions with FDR Control [10.750902543185802]
We propose a two-stage testing procedure with false discovery rate (FDR) control, which is known as a less conservative multiple-testing correction.
We demonstrate via comprehensive simulation studies that our two-stage procedure is more efficient than the classical BH procedure, with a comparable or improved statistical power.
arXiv Detail & Related papers (2022-08-31T19:17:00Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - AdaPT-GMM: Powerful and robust covariate-assisted multiple testing [0.7614628596146599]
We propose a new empirical Bayes method for co-assisted multiple testing with false discovery rate (FDR) control.
Our method refines the adaptive p-value thresholding (AdaPT) procedure by generalizing its masking scheme.
We show in extensive simulations and real data examples that our new method, which we call AdaPT-GMM, consistently delivers high power.
arXiv Detail & Related papers (2021-06-30T05:06:18Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Interpretable random forest models through forward variable selection [0.0]
We develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function.
We demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands.
arXiv Detail & Related papers (2020-05-11T13:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.