Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications
- URL: http://arxiv.org/abs/2006.16376v1
- Date: Mon, 29 Jun 2020 20:57:20 GMT
- Title: Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications
- Authors: Yating Wang, Wei Deng, Lin Guang
- Abstract summary: The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
- Score: 5.660384137948734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose a Bayesian type sparse deep learning algorithm. The
algorithm utilizes a set of spike-and-slab priors for the parameters in the
deep neural network. The hierarchical Bayesian mixture will be trained using an
adaptive empirical method. That is, one will alternatively sample from the
posterior using preconditioned stochastic gradient Langevin Dynamics (PSGLD),
and optimize the latent variables via stochastic approximation. The sparsity of
the network is achieved while optimizing the hyperparameters with adaptive
searching and penalizing. A popular SG-MCMC approach is Stochastic gradient
Langevin dynamics (SGLD). However, considering the complex geometry in the
model parameter space in non-convex learning, updating parameters using a
universal step size in each component as in SGLD may cause slow mixing. To
address this issue, we apply a computationally manageable preconditioner in the
updating rule, which provides a step-size parameter to adapt to local geometric
properties. Moreover, by smoothly optimizing the hyperparameter in the
preconditioning matrix, our proposed algorithm ensures a decreasing bias, which
is introduced by ignoring the correction term in preconditioned SGLD. According
to the existing theoretical framework, we show that the proposed algorithm can
asymptotically converge to the correct distribution with a controllable bias
under mild conditions. Numerical tests are performed on both synthetic
regression problems and learning the solutions of elliptic PDE, which
demonstrate the accuracy and efficiency of present work.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates [17.777466668123886]
We introduce PROMISE ($textbfPr$econditioned $textbfO$ptimization $textbfM$ethods by $textbfI$ncorporating $textbfS$calable Curvature $textbfE$stimates), a suite of sketching-based preconditioned gradient algorithms.
PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha.
arXiv Detail & Related papers (2023-09-05T07:49:10Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models [0.2741266294612776]
We analyze a class of gradient algorithms with momentum on a high-dimensional random least squares problem.
We show that (small-batch) momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly.
In the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum.
arXiv Detail & Related papers (2021-06-07T15:08:24Z) - AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods [7.486132958737807]
We present an adaptive variance reduced method with an implicit approach for adaptivity.
We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits.
This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
arXiv Detail & Related papers (2021-02-19T01:17:15Z) - An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior.
We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z) - Stochastic Gradient Langevin Dynamics Algorithms with Adaptive Drifts [8.36840154574354]
We propose a class of adaptive gradient Markov chain Monte Carlo (SGMCMC) algorithms, where the drift function is biased to enhance escape from saddle points and the bias is adaptively adjusted according to the gradient of past samples.
We demonstrate via numerical examples that the proposed algorithms can significantly outperform the existing SGMCMC algorithms.
arXiv Detail & Related papers (2020-09-20T22:03:39Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.