Related papers: Optimising Distributions with Natural Gradient Surrogates

Optimising Distributions with Natural Gradient Surrogates

URL: http://arxiv.org/abs/2310.11837v2
Date: Mon, 4 Mar 2024 11:52:38 GMT
Title: Optimising Distributions with Natural Gradient Surrogates
Authors: Jonathan So, Richard E. Turner
Abstract summary: We propose a novel technique for tackling such issues, which involves reframing the optimisation as one with respect to the parameters of a surrogate distribution. We give several examples of existing methods that can be interpreted as applying this technique, and propose a new method for applying it to a wide variety of problems. We demonstrate our method on maximum likelihood estimation and variational inference tasks.
Score: 24.53732048133567
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Natural gradient methods have been used to optimise the parameters of probability distributions in a variety of settings, often resulting in fast-converging procedures. Unfortunately, for many distributions of interest, computing the natural gradient has a number of challenges. In this work we propose a novel technique for tackling such issues, which involves reframing the optimisation as one with respect to the parameters of a surrogate distribution, for which computing the natural gradient is easy. We give several examples of existing methods that can be interpreted as applying this technique, and propose a new method for applying it to a wide variety of problems. Our method expands the set of distributions that can be efficiently targeted with natural gradients. Furthermore, it is fast, easy to understand, simple to implement using standard autodiff software, and does not require lengthy model-specific derivations. We demonstrate our method on maximum likelihood estimation and variational inference tasks.

Related papers

Efficient Differentiable Approximation of Generalized Low-rank Regularization [64.73416824444328]
Low-rank regularization (LRR) has been widely applied in various machine learning tasks.<n>In this paper, we propose an efficient differentiable approximation of LRR.
arXiv Detail & Related papers (2025-05-21T11:49:17Z)
Information Geometry and Beta Link for Optimizing Sparse Variational Student-t Processes [6.37512592611305]
Student-t Processes has been proposed to enhance computational efficiency and flexibility for real-world datasets using gradient descent. Traditional gradient descent methods like Adam may not fully exploit the parameter space geometry, potentially leading to slower convergence and suboptimal performance. We adopt natural gradient methods from information geometry for variational parameter optimization of Student-t Processes.
arXiv Detail & Related papers (2024-08-13T07:53:39Z)
Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1. This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
Fast and Correct Gradient-Based Optimisation for Probabilistic Programming via Smoothing [0.0]
We study the foundations of variational inference, which frames posterior inference as an optimisation problem. We endow our language with both a measurable and a smoothed (approximate) value semantics. We show that our approach has a similar convergence as a key competitor, but is simpler, faster, and attains orders of magnitude reduction in work-normalised variance.
arXiv Detail & Related papers (2023-01-09T15:12:45Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
Efficient Natural Gradient Descent Methods for Large-Scale Optimization Problems [1.2891210250935146]
We propose an efficient method for computing natural gradient descent directions with respect to a generic metric in the state space. Our technique relies on representing the natural gradient direction as a solution to a standard least-squares problem. We can reliably compute several natural gradient descents, including the Wasserstein natural gradient parameter, for a large-scale space.
arXiv Detail & Related papers (2022-02-13T07:32:17Z)
TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation [79.4205462326301]
TaylorGAN is a novel approach to score function-based natural language generation. It augments the gradient estimation by off-policy update and the first-order Taylor expansion. It enables us to train NLG models from scratch with smaller batch size.
arXiv Detail & Related papers (2020-11-27T02:26:15Z)
Conditional gradient methods for stochastically constrained convex minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems. The most important feature of our framework is that only a subset of the constraints is processed at each iteration. Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z)
Conditional Gradient Methods for Convex Optimization with General Affine and Nonlinear Constraints [8.643249539674612]
This paper presents new conditional gradient methods for solving convex optimization problems with general affine and nonlinear constraints. We first present a new constraint extrapolated condition gradient (CoexCG) method that can achieve an $cal O (1/epsilon2)$ iteration complexity for both smooth and structured nonsmooth function constrained convex optimization. We further develop novel variants of CoexCG, namely constraint extrapolated and dual regularized conditional gradient (CoexDurCG) methods, that can achieve similar iteration complexity to CoexCG but allow adaptive selection for algorithmic parameters.
arXiv Detail & Related papers (2020-06-30T23:49:38Z)
Handling the Positive-Definite Constraint in the Bayesian Learning Rule [33.87717973872535]
The Bayesian learning rule is a natural-gradient variational inference method. When variational parameters lie in an open constraint set, the rule may not satisfy the constraint and requires line-searches which could slow down the algorithm. Our work makes it easier to apply the rule in the presence of positive-definite constraints in parameter spaces.
arXiv Detail & Related papers (2020-02-24T03:29:39Z)
Implicit differentiation of Lasso-type models for hyperparameter optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.