Related papers: Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations

Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations

URL: http://arxiv.org/abs/2411.09734v1
Date: Thu, 14 Nov 2024 19:00:01 GMT
Title: Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations
Authors: Carlos Heredia,
Abstract summary: We propose a continuous-time formulation for the AdaGrad, RMSProp, and Adam optimization algorithms. We perform numerical simulations of these equations to demonstrate their validity as accurate approximations of the original algorithms.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose a continuous-time formulation for the AdaGrad, RMSProp, and Adam optimization algorithms by modeling them as first-order integro-differential equations. We perform numerical simulations of these equations to demonstrate their validity as accurate approximations of the original algorithms. Our results indicate a strong agreement between the behavior of the continuous-time models and the discrete implementations, thus providing a new perspective on the theoretical understanding of adaptive optimization methods.

Related papers

RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models. AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model. We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z)
Efficient Fairness-Performance Pareto Front Computation [51.558848491038916]
We show that optimal fair representations possess several useful structural properties. We then show that these approxing problems can be solved efficiently via concave programming methods.
arXiv Detail & Related papers (2024-09-26T08:46:48Z)
A Markovian Model for Learning-to-Optimize [4.112909937203119]
We present a probabilistic model for iterative algorithms with the use case of optimization algorithms in mind. Based on this model, we present PAC-Bayesian generalization bounds for functions that are defined on the trajectory of the learned algorithm.
arXiv Detail & Related papers (2024-08-21T14:00:22Z)
Comparison of Single- and Multi- Objective Optimization Quality for Evolutionary Equation Discovery [77.34726150561087]
Evolutionary differential equation discovery proved to be a tool to obtain equations with less a priori assumptions. The proposed comparison approach is shown on classical model examples -- Burgers equation, wave equation, and Korteweg - de Vries equation.
arXiv Detail & Related papers (2023-06-29T15:37:19Z)
An Optimization-based Deep Equilibrium Model for Hyperspectral Image Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem. A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network. The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
On the connections between optimization algorithms, Lyapunov functions, and differential equations: theory and insights [0.0]
We revisit the framework introduced by Fazylab et al. to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. We prove convergence rates that improve on those available in the literature.
arXiv Detail & Related papers (2023-05-15T14:03:16Z)
Distributed Bayesian Learning of Dynamic States [65.7870637855531]
The proposed algorithm is a distributed Bayesian filtering task for finite-state hidden Markov models. It can be used for sequential state estimation, as well as for modeling opinion formation over social networks under dynamic environments.
arXiv Detail & Related papers (2022-12-05T19:40:17Z)
On the implementation of a global optimization method for mixed-variable problems [0.30458514384586394]
The algorithm is based on the radial basis function of Gutmann and the metric response surface method of Regis and Shoemaker. We propose several modifications aimed at generalizing and improving these two algorithms.
arXiv Detail & Related papers (2020-09-04T13:36:56Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
Adaptive Discretization for Model-Based Reinforcement Learning [10.21634042036049]
We introduce the technique of adaptive discretization to design an efficient model-based episodic reinforcement learning algorithm. Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space.
arXiv Detail & Related papers (2020-07-01T19:36:46Z)
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory. Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
Analysis of Bayesian Inference Algorithms by the Dynamical Functional Approach [2.8021833233819486]
We analyze an algorithm for approximate inference with large Gaussian latent variable models in a student-trivial scenario. For the case of perfect data-model matching, the knowledge of static order parameters derived from the replica method allows us to obtain efficient algorithmic updates.
arXiv Detail & Related papers (2020-01-14T17:22:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.