Fast Bayesian Variable Selection in Binomial and Negative Binomial
Regression
- URL: http://arxiv.org/abs/2106.14981v1
- Date: Mon, 28 Jun 2021 20:54:41 GMT
- Title: Fast Bayesian Variable Selection in Binomial and Negative Binomial
Regression
- Authors: Martin Jankowiak
- Abstract summary: We introduce an efficient MCMC scheme for variable selection in binomial and negative binomial regression, that exploits logistic regression as a special case.
In experiments we demonstrate the effectiveness of our approach, including on data with seventeen thousand covariates.
- Score: 9.774282306558465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian variable selection is a powerful tool for data analysis, as it
offers a principled method for variable selection that accounts for prior
information and uncertainty. However, wider adoption of Bayesian variable
selection has been hampered by computational challenges, especially in
difficult regimes with a large number of covariates or non-conjugate
likelihoods. Generalized linear models for count data, which are prevalent in
biology, ecology, economics, and beyond, represent an important special case.
Here we introduce an efficient MCMC scheme for variable selection in binomial
and negative binomial regression that exploits Tempered Gibbs Sampling (Zanella
and Roberts, 2019) and that includes logistic regression as a special case. In
experiments we demonstrate the effectiveness of our approach, including on
cancer data with seventeen thousand covariates.
Related papers
- Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Variable selection for nonlinear Cox regression model via deep learning [0.0]
We extend the recently developed deep learning-based variable selection model LassoNet to survival data.
We apply the proposed methodology to analyze a real data set on diffuse large B-cell lymphoma.
arXiv Detail & Related papers (2022-11-17T01:17:54Z) - Bayesian Variable Selection in a Million Dimensions [7.366246663367533]
We introduce an efficient MCMC scheme whose cost per iteration is sublinear in P.
We show how this scheme can be extended to generalized linear models for count data.
In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.
arXiv Detail & Related papers (2022-08-02T00:11:15Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Flexible variable selection in the presence of missing data [0.0]
We propose a non-parametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data.
We show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance.
arXiv Detail & Related papers (2022-02-25T21:41:03Z) - Machine Learning for Multi-Output Regression: When should a holistic
multivariate approach be preferred over separate univariate ones? [62.997667081978825]
Tree-based ensembles such as the Random Forest are modern classics among statistical learning methods.
We compare these methods in extensive simulations to help in answering the primary question when to use multivariate ensemble techniques.
arXiv Detail & Related papers (2022-01-14T08:44:25Z) - Variational Bayes for high-dimensional proportional hazards models with
applications to gene expression variable selection [3.8761064607384195]
We propose a variational Bayesian proportional hazards model for prediction and variable selection regarding high-dimensional survival data.
Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC.
We demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes.
arXiv Detail & Related papers (2021-12-19T22:10:41Z) - Variable selection with missing data in both covariates and outcomes:
Imputation and machine learning [1.0333430439241666]
The missing data issue is ubiquitous in health studies.
Machine learning methods weaken parametric assumptions.
XGBoost and BART have the overall best performance across various settings.
arXiv Detail & Related papers (2021-04-06T20:18:29Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z) - Optimal Feature Manipulation Attacks Against Linear Regression [64.54500628124511]
In this paper, we investigate how to manipulate the coefficients obtained via linear regression by adding carefully designed poisoning data points to the dataset or modify the original data points.
Given the energy budget, we first provide the closed-form solution of the optimal poisoning data point when our target is modifying one designated regression coefficient.
We then extend the analysis to the more challenging scenario where the attacker aims to change one particular regression coefficient while making others to be changed as small as possible.
arXiv Detail & Related papers (2020-02-29T04:26:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.