Tuning-free coreset Markov chain Monte Carlo
- URL: http://arxiv.org/abs/2410.18973v1
- Date: Thu, 24 Oct 2024 17:59:23 GMT
- Title: Tuning-free coreset Markov chain Monte Carlo
- Authors: Naitong Chen, Jonathan H. Huggins, Trevor Campbell,
- Abstract summary: A Bayesian coreset is a small, weighted subset of a data set that replaces the full data during inference to reduce computational cost.
Coreset Markov chain Monte Carlo (Coreset MCMC) uses draws from an adaptive Markov chain targeting the coreset to train the coreset weights.
We propose a learning-rate-free gradient optimization procedure, Hot-start Distance over Gradient (Hot DoG)
Empirical results demonstrate that Hot DoG provides higher quality posterior approximations than other learning-rate-free gradient methods.
- Score: 14.360996967498
- License:
- Abstract: A Bayesian coreset is a small, weighted subset of a data set that replaces the full data during inference to reduce computational cost. The state-of-the-art coreset construction algorithm, Coreset Markov chain Monte Carlo (Coreset MCMC), uses draws from an adaptive Markov chain targeting the coreset posterior to train the coreset weights via stochastic gradient optimization. However, the quality of the constructed coreset, and thus the quality of its posterior approximation, is sensitive to the stochastic optimization learning rate. In this work, we propose a learning-rate-free stochastic gradient optimization procedure, Hot-start Distance over Gradient (Hot DoG), for training coreset weights in Coreset MCMC without user tuning effort. Empirical results demonstrate that Hot DoG provides higher quality posterior approximations than other learning-rate-free stochastic gradient methods, and performs competitively to optimally-tuned ADAM.
Related papers
- Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Coreset Markov Chain Monte Carlo [15.310842498680483]
State of the art methods for tuning coreset weights are expensive, require nontrivial user input, and impose constraints on the model.
We propose a new method -- Coreset MCMC -- that simulates a Markov chain targeting the coreset posterior, while simultaneously updating the coreset weights.
arXiv Detail & Related papers (2023-10-25T23:53:27Z) - CEM-GD: Cross-Entropy Method with Gradient Descent Planner for
Model-Based Reinforcement Learning [41.233656743112185]
We propose a novel planner that combines first-order methods with Cross-Entropy Method (CEM)
We show that as the dimensionality of the planning problem increases, CEM-GD maintains desirable performance with a constant small number of samples.
arXiv Detail & Related papers (2021-12-14T21:11:27Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior.
We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z) - GTAdam: Gradient Tracking with Adaptive Momentum for Distributed Online
Optimization [4.103281325880475]
This paper deals with a network of computing agents aiming to solve an online optimization problem in a distributed fashion, by means of local computation and communication, without any central coordinator.
We propose the gradient tracking with adaptive momentum estimation (GTAdam) distributed algorithm, which combines a gradient tracking mechanism with first and second order momentum estimates of the gradient.
In these numerical experiments from multi-agent learning, GTAdam outperforms state-of-the-art distributed optimization methods.
arXiv Detail & Related papers (2020-09-03T15:20:21Z) - Solving Stochastic Compositional Optimization is Nearly as Easy as
Solving Stochastic Optimization [47.93365664380274]
This paper presents a newally Corrected Compositional gradient method (SCSC)
SCSC runs in a single-time scale with a single loop, uses a fixed batch size, and guarantees to converge at the same rate as the gradient descent (SGD) method for non-compositional optimization.
arXiv Detail & Related papers (2020-08-25T06:54:00Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.