Particle-Gibbs Sampling For Bayesian Feature Allocation Models
- URL: http://arxiv.org/abs/2001.09367v1
- Date: Sat, 25 Jan 2020 22:11:51 GMT
- Title: Particle-Gibbs Sampling For Bayesian Feature Allocation Models
- Authors: Alexandre Bouchard-C\^ot\'e and Andrew Roth
- Abstract summary: Most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix.
We have developed a Gibbs sampler that can update an entire row of the feature allocation matrix in a single move.
This sampler is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features.
We develop a Particle Gibbs sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features.
- Score: 77.57285768500225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian feature allocation models are a popular tool for modelling data with
a combinatorial latent structure. Exact inference in these models is generally
intractable and so practitioners typically apply Markov Chain Monte Carlo
(MCMC) methods for posterior inference. The most widely used MCMC strategies
rely on an element wise Gibbs update of the feature allocation matrix. These
element wise updates can be inefficient as features are typically strongly
correlated. To overcome this problem we have developed a Gibbs sampler that can
update an entire row of the feature allocation matrix in a single move.
However, this sampler is impractical for models with a large number of features
as the computational complexity scales exponentially in the number of features.
We develop a Particle Gibbs sampler that targets the same distribution as the
row wise Gibbs updates, but has computational complexity that only grows
linearly in the number of features. We compare the performance of our proposed
methods to the standard Gibbs sampler using synthetic data from a range of
feature allocation models. Our results suggest that row wise updates using the
PG methodology can significantly improve the performance of samplers for
feature allocation models.
Related papers
- Gradients of Functions of Large Matrices [18.361820028457718]
We show how to differentiate workhorses of numerical linear algebra efficiently.
We derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax.
All this is achieved without any problem-specific code optimisation.
arXiv Detail & Related papers (2024-05-27T15:39:45Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Dimension-free mixing times of Gibbs samplers for Bayesian hierarchical
models [0.0]
We analyse the behaviour of total variation mixing times of Gibbs samplers targeting hierarchical models.
We obtain convergence results under random data-generating assumptions for a broad class of two-level models.
arXiv Detail & Related papers (2023-04-14T08:30:40Z) - Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo
sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model.
We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Scalable mixed-domain Gaussian process modeling and model reduction for longitudinal data [5.00301731167245]
We derive a basis function approximation scheme for mixed-domain covariance functions.
We show that we can approximate the exact GP model accurately in a fraction of the runtime.
We also demonstrate a scalable model reduction workflow for obtaining smaller and more interpretable models.
arXiv Detail & Related papers (2021-11-03T04:47:37Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Oops I Took A Gradient: Scalable Sampling for Discrete Distributions [53.3142984019796]
We show that this approach outperforms generic samplers in a number of difficult settings.
We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data.
arXiv Detail & Related papers (2021-02-08T20:08:50Z) - Multilevel Gibbs Sampling for Bayesian Regression [6.2997667081978825]
The level hierarchy of data matrices is created by clustering the features and/or samples of data matrices.
The use of correlated samples is investigated for variance reduction to improve the convergence of the Markov Chain.
Speed-up is achieved for almost all of them without significant loss in predictive performance.
arXiv Detail & Related papers (2020-09-25T11:18:17Z) - Quadruply Stochastic Gaussian Processes [10.152838128195466]
We introduce a variational inference procedure for training scalable Gaussian process (GP) models whose per-iteration complexity is independent of both the number of training points, $n$, and the number basis functions used in the kernel approximation, $m$.
We demonstrate accurate inference on large classification and regression datasets using GPs and relevance vector machines with up to $m = 107$ basis functions.
arXiv Detail & Related papers (2020-06-04T17:06:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.