Handling the Positive-Definite Constraint in the Bayesian Learning Rule
- URL: http://arxiv.org/abs/2002.10060v13
- Date: Tue, 21 Jul 2020 16:01:35 GMT
- Title: Handling the Positive-Definite Constraint in the Bayesian Learning Rule
- Authors: Wu Lin, Mark Schmidt, Mohammad Emtiyaz Khan
- Abstract summary: The Bayesian learning rule is a natural-gradient variational inference method.
When variational parameters lie in an open constraint set, the rule may not satisfy the constraint and requires line-searches which could slow down the algorithm.
Our work makes it easier to apply the rule in the presence of positive-definite constraints in parameter spaces.
- Score: 33.87717973872535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Bayesian learning rule is a natural-gradient variational inference
method, which not only contains many existing learning algorithms as special
cases but also enables the design of new algorithms. Unfortunately, when
variational parameters lie in an open constraint set, the rule may not satisfy
the constraint and requires line-searches which could slow down the algorithm.
In this work, we address this issue for positive-definite constraints by
proposing an improved rule that naturally handles the constraints. Our
modification is obtained by using Riemannian gradient methods, and is valid
when the approximation attains a \emph{block-coordinate natural
parameterization} (e.g., Gaussian distributions and their mixtures). We propose
a principled way to derive Riemannian gradients and retractions from scratch.
Our method outperforms existing methods without any significant increase in
computation. Our work makes it easier to apply the rule in the presence of
positive-definite constraints in parameter spaces.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms [4.307128674848627]
AdaPG$q,r$ is a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds.
Different choices of the parameters $q$ and $r$ are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations.
arXiv Detail & Related papers (2023-11-30T10:29:43Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints [9.301728976515255]
This article provides new practical and theoretical developments for the landing algorithm.
First, the method is extended to the Stiefel manifold.
We also consider variance reduction algorithms when the cost function is an average of many functions.
arXiv Detail & Related papers (2023-03-29T07:36:54Z) - Efficient Natural Gradient Descent Methods for Large-Scale Optimization
Problems [1.2891210250935146]
We propose an efficient method for computing natural gradient descent directions with respect to a generic metric in the state space.
Our technique relies on representing the natural gradient direction as a solution to a standard least-squares problem.
We can reliably compute several natural gradient descents, including the Wasserstein natural gradient parameter, for a large-scale space.
arXiv Detail & Related papers (2022-02-13T07:32:17Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - l1-Norm Minimization with Regula Falsi Type Root Finding Methods [81.55197998207593]
We develop an efficient approach for non likelihoods using Regula Falsi root-finding techniques.
Practical performance is illustrated using l1-regularized classes t.
arXiv Detail & Related papers (2021-05-01T13:24:38Z) - AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods [7.486132958737807]
We present an adaptive variance reduced method with an implicit approach for adaptivity.
We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits.
This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
arXiv Detail & Related papers (2021-02-19T01:17:15Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z) - Gradient-Free Methods for Saddle-Point Problem [125.99533416395765]
We generalize the approach Gasnikov et. al., 2017, which allows to solve (stochastic) convex optimization problems with an inexact gradient-free oracle.
Our approach reduces $fracnlog n$ times the required number of oracle calls.
In the second part of the paper, we analyze the case when such an assumption cannot be made, we propose a general approach on how to modernize the method to solve this problem.
arXiv Detail & Related papers (2020-05-12T16:44:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.