Optimising Distributions with Natural Gradient Surrogates
- URL: http://arxiv.org/abs/2310.11837v2
- Date: Mon, 4 Mar 2024 11:52:38 GMT
- Title: Optimising Distributions with Natural Gradient Surrogates
- Authors: Jonathan So, Richard E. Turner
- Abstract summary: We propose a novel technique for tackling such issues, which involves reframing the optimisation as one with respect to the parameters of a surrogate distribution.
We give several examples of existing methods that can be interpreted as applying this technique, and propose a new method for applying it to a wide variety of problems.
We demonstrate our method on maximum likelihood estimation and variational inference tasks.
- Score: 24.53732048133567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural gradient methods have been used to optimise the parameters of
probability distributions in a variety of settings, often resulting in
fast-converging procedures. Unfortunately, for many distributions of interest,
computing the natural gradient has a number of challenges. In this work we
propose a novel technique for tackling such issues, which involves reframing
the optimisation as one with respect to the parameters of a surrogate
distribution, for which computing the natural gradient is easy. We give several
examples of existing methods that can be interpreted as applying this
technique, and propose a new method for applying it to a wide variety of
problems. Our method expands the set of distributions that can be efficiently
targeted with natural gradients. Furthermore, it is fast, easy to understand,
simple to implement using standard autodiff software, and does not require
lengthy model-specific derivations. We demonstrate our method on maximum
likelihood estimation and variational inference tasks.
Related papers
- Guided Sketch-Based Program Induction by Search Gradients [0.0]
We propose a framework for learning parameterized programs via search gradients using evolution strategies.
This formulation departs from traditional program induction as it allows for the programmer to impart task-specific code to the program'sketch'
arXiv Detail & Related papers (2024-02-10T16:47:53Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Fast and Correct Gradient-Based Optimisation for Probabilistic
Programming via Smoothing [0.0]
We study the foundations of variational inference, which frames posterior inference as an optimisation problem.
We endow our language with both a measurable and a smoothed (approximate) value semantics.
We show that our approach has a similar convergence as a key competitor, but is simpler, faster, and attains orders of magnitude reduction in work-normalised variance.
arXiv Detail & Related papers (2023-01-09T15:12:45Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Efficient Natural Gradient Descent Methods for Large-Scale Optimization
Problems [1.2891210250935146]
We propose an efficient method for computing natural gradient descent directions with respect to a generic metric in the state space.
Our technique relies on representing the natural gradient direction as a solution to a standard least-squares problem.
We can reliably compute several natural gradient descents, including the Wasserstein natural gradient parameter, for a large-scale space.
arXiv Detail & Related papers (2022-02-13T07:32:17Z) - TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural
Language Generation [79.4205462326301]
TaylorGAN is a novel approach to score function-based natural language generation.
It augments the gradient estimation by off-policy update and the first-order Taylor expansion.
It enables us to train NLG models from scratch with smaller batch size.
arXiv Detail & Related papers (2020-11-27T02:26:15Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z) - Conditional Gradient Methods for Convex Optimization with General Affine
and Nonlinear Constraints [8.643249539674612]
This paper presents new conditional gradient methods for solving convex optimization problems with general affine and nonlinear constraints.
We first present a new constraint extrapolated condition gradient (CoexCG) method that can achieve an $cal O (1/epsilon2)$ iteration complexity for both smooth and structured nonsmooth function constrained convex optimization.
We further develop novel variants of CoexCG, namely constraint extrapolated and dual regularized conditional gradient (CoexDurCG) methods, that can achieve similar iteration complexity to CoexCG but allow adaptive selection for algorithmic parameters.
arXiv Detail & Related papers (2020-06-30T23:49:38Z) - Handling the Positive-Definite Constraint in the Bayesian Learning Rule [33.87717973872535]
The Bayesian learning rule is a natural-gradient variational inference method.
When variational parameters lie in an open constraint set, the rule may not satisfy the constraint and requires line-searches which could slow down the algorithm.
Our work makes it easier to apply the rule in the presence of positive-definite constraints in parameter spaces.
arXiv Detail & Related papers (2020-02-24T03:29:39Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.