SGB: Stochastic Gradient Bound Method for Optimizing Partition Functions
- URL: http://arxiv.org/abs/2011.01474v1
- Date: Tue, 3 Nov 2020 04:42:51 GMT
- Title: SGB: Stochastic Gradient Bound Method for Optimizing Partition Functions
- Authors: Jing Wang, Anna Choromanska
- Abstract summary: This paper addresses the problem of optimizing partition functions in a learning setting.
We propose a variant of the bound majorization algorithm that relies on upper-bounding the partition function with a quadratic surrogate.
- Score: 15.33098084159285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of optimizing partition functions in a
stochastic learning setting. We propose a stochastic variant of the bound
majorization algorithm that relies on upper-bounding the partition function
with a quadratic surrogate. The update of the proposed method, that we refer to
as Stochastic Partition Function Bound (SPFB), resembles scaled stochastic
gradient descent where the scaling factor relies on a second order term that is
however different from the Hessian. Similarly to quasi-Newton schemes, this
term is constructed using the stochastic approximation of the value of the
function and its gradient. We prove sub-linear convergence rate of the proposed
method and show the construction of its low-rank variant (LSPFB). Experiments
on logistic regression demonstrate that the proposed schemes significantly
outperform SGD. We also discuss how to use quadratic partition function bound
for efficient training of deep learning models and in non-convex optimization.
Related papers
- A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization [0.0]
We consider optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector.
We propose a specialized singlescale method for non constrained learning problems with a smooth outer function and a different conditional inner function.
arXiv Detail & Related papers (2024-05-17T14:35:50Z) - Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization [0.6906005491572401]
We show that noise in batch descent gradient (SGD) has the effect of smoothing objective function.
We analyze a new graduated optimization algorithm that varies the degree of smoothing by learning rate and batch size.
arXiv Detail & Related papers (2023-11-15T07:27:40Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm.
We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Riemannian Stochastic Gradient Method for Nested Composition Optimization [0.0]
This work considers optimization of composition of functions in a nested form over Riemannian where each function contains an expectation.
This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in metalearning.
arXiv Detail & Related papers (2022-07-19T15:58:27Z) - A Closed Loop Gradient Descent Algorithm applied to Rosenbrock's
function [0.0]
We introduce a novel adaptive technique for an gradient system which finds application as a gradient descent algorithm for unconstrained inertial damping.
Also using Lyapunov stability analysis, we demonstrate the performance of the continuous numerical-time version of the algorithm.
arXiv Detail & Related papers (2021-08-29T17:25:24Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.