SGEM: stochastic gradient with energy and momentum
- URL: http://arxiv.org/abs/2208.02208v1
- Date: Wed, 3 Aug 2022 16:45:22 GMT
- Title: SGEM: stochastic gradient with energy and momentum
- Authors: Hailiang Liu and Xuping Tian
- Abstract summary: We propose S, Gradient with Energy Momentum, to solve a class of general non-GEM optimization problems.
SGEM incorporates both energy and momentum so as to derive energy-dependent convergence rates.
Our results show that SGEM converges faster than AEGD and neural training.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum,
to solve a large class of general non-convex stochastic optimization problems,
based on the AEGD method that originated in the work [AEGD: Adaptive Gradient
Descent with Energy. arXiv: 2010.05109]. SGEM incorporates both energy and
momentum at the same time so as to inherit their dual advantages. We show that
SGEM features an unconditional energy stability property, and derive
energy-dependent convergence rates in the general nonconvex stochastic setting,
as well as a regret bound in the online convex setting. A lower threshold for
the energy variable is also provided. Our experimental results show that SGEM
converges faster than AEGD and generalizes better or at least as well as SGDM
in training some deep neural networks.
Related papers
- On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks [56.78271181959529]
Kolmogorov--Arnold Networks (KANs) have gained significant attention in the deep learning community.
Empirical investigations demonstrate that KANs optimized via gradient descent (SGD) are capable of achieving near-zero training loss.
arXiv Detail & Related papers (2024-10-10T15:34:10Z) - Enhancing the Energy Gap of Random Graph Problems via XX-catalysts in Quantum Annealing [0.0]
We show that employing multiple XX-catalysts on the edges of a graph significantly enhances the minimum energy gap.
Remarkably, our analysis shows that the more severe the first-order phase transition, the more effective the catalyst is in opening the gap.
arXiv Detail & Related papers (2024-09-24T18:00:01Z) - Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention [0.7770029179741429]
Conditional diffusion models have shown remarkable success in visual content generation.
Recent attempts to extend unconditional guidance have relied on techniques, resulting in suboptimal generation quality.
We propose Smoothed Energy Guidance (SEG), a novel training- and condition-free approach to enhance image generation.
arXiv Detail & Related papers (2024-08-01T17:59:09Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Sampling with Mollified Interaction Energy Descent [57.00583139477843]
We present a new optimization-based method for sampling called mollified interaction energy descent (MIED)
MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs)
We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like SVGD.
arXiv Detail & Related papers (2022-10-24T16:54:18Z) - NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
We propose a novel, robust and accelerated iteration that relies on two key elements.
The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively.
We show that NAG-arity is competitive with state-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models.
arXiv Detail & Related papers (2022-09-29T16:54:53Z) - An Adaptive Gradient Method with Energy and Momentum [0.0]
We introduce a novel algorithm for gradient-based optimization of objective functions.
The method is simple to implement, computationally efficient, and well suited for large-scale machine learning problems.
arXiv Detail & Related papers (2022-03-23T04:48:38Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
Improve Generalization [89.7882166459412]
gradient noise (SGN) acts as implicit regularization for deep learning.
Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning.
For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach.
arXiv Detail & Related papers (2021-03-31T16:08:06Z) - AEGD: Adaptive Gradient Descent with Energy [0.0]
We propose AEGD, a new algorithm for first-order gradient non-energy objective functions variable.
We show energy-dependent AEGD for both non-energy convergence and desired small step size.
arXiv Detail & Related papers (2020-10-10T22:17:27Z) - Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences.
FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions.
One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.