Bayesian Variable Selection in a Million Dimensions
- URL: http://arxiv.org/abs/2208.01180v1
- Date: Tue, 2 Aug 2022 00:11:15 GMT
- Title: Bayesian Variable Selection in a Million Dimensions
- Authors: Martin Jankowiak
- Abstract summary: We introduce an efficient MCMC scheme whose cost per iteration is sublinear in P.
We show how this scheme can be extended to generalized linear models for count data.
In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.
- Score: 7.366246663367533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian variable selection is a powerful tool for data analysis, as it
offers a principled method for variable selection that accounts for prior
information and uncertainty. However, wider adoption of Bayesian variable
selection has been hampered by computational challenges, especially in
difficult regimes with a large number of covariates P or non-conjugate
likelihoods. To scale to the large P regime we introduce an efficient MCMC
scheme whose cost per iteration is sublinear in P. In addition we show how this
scheme can be extended to generalized linear models for count data, which are
prevalent in biology, ecology, economics, and beyond. In particular we design
efficient algorithms for variable selection in binomial and negative binomial
regression, which includes logistic regression as a special case. In
experiments we demonstrate the effectiveness of our methods, including on
cancer and maize genomic data.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables.
We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler.
Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z) - A model-free feature selection technique of feature screening and random
forest based recursive feature elimination [0.0]
We propose a model-free feature selection method for ultra-high dimensional data with mass features.
We show that the proposed method is selection consistent and $L$ consistent under weak regularity conditions.
arXiv Detail & Related papers (2023-02-15T03:39:16Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Flexible variable selection in the presence of missing data [0.0]
We propose a non-parametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data.
We show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance.
arXiv Detail & Related papers (2022-02-25T21:41:03Z) - Variational Bayes for high-dimensional proportional hazards models with
applications to gene expression variable selection [3.8761064607384195]
We propose a variational Bayesian proportional hazards model for prediction and variable selection regarding high-dimensional survival data.
Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC.
We demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes.
arXiv Detail & Related papers (2021-12-19T22:10:41Z) - Fast Bayesian Variable Selection in Binomial and Negative Binomial
Regression [9.774282306558465]
We introduce an efficient MCMC scheme for variable selection in binomial and negative binomial regression, that exploits logistic regression as a special case.
In experiments we demonstrate the effectiveness of our approach, including on data with seventeen thousand covariates.
arXiv Detail & Related papers (2021-06-28T20:54:41Z) - Variable selection with missing data in both covariates and outcomes:
Imputation and machine learning [1.0333430439241666]
The missing data issue is ubiquitous in health studies.
Machine learning methods weaken parametric assumptions.
XGBoost and BART have the overall best performance across various settings.
arXiv Detail & Related papers (2021-04-06T20:18:29Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - Optimal Feature Manipulation Attacks Against Linear Regression [64.54500628124511]
In this paper, we investigate how to manipulate the coefficients obtained via linear regression by adding carefully designed poisoning data points to the dataset or modify the original data points.
Given the energy budget, we first provide the closed-form solution of the optimal poisoning data point when our target is modifying one designated regression coefficient.
We then extend the analysis to the more challenging scenario where the attacker aims to change one particular regression coefficient while making others to be changed as small as possible.
arXiv Detail & Related papers (2020-02-29T04:26:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.