Generalized Exponentiated Gradient Algorithms Using the Euler Two-Parameter Logarithm
- URL: http://arxiv.org/abs/2502.17500v1
- Date: Fri, 21 Feb 2025 11:05:04 GMT
- Title: Generalized Exponentiated Gradient Algorithms Using the Euler Two-Parameter Logarithm
- Authors: Andrzej Cichocki,
- Abstract summary: We propose and investigate a new class of Generalized Exponentiated Gradient (GEG) algorithms using Mirror Descent (MD) approaches.<n>The concept of generalized entropies and associated deformed logarithms provide deeper insight into novel gradient descent updates.
- Score: 14.572732893433825
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper we propose and investigate a new class of Generalized Exponentiated Gradient (GEG) algorithms using Mirror Descent (MD) approaches, and applying as a regularization function the Bregman divergence with two-parameter deformation of logarithm as a link function. This link function (referred to as the Euler logarithm) is associated with a wide class of generalized entropies. In order to derive novel GEG/MD updates, we estimate generalized exponential function, which closely approximates the inverse of the Euler two-parameter logarithm. The characteristic/shape and properties of the Euler logarithm and its inverse -- deformed exponential functions are tuned by two or even more hyperparameters. By learning these hyperparameters, we can adapt to distribution of training data, and we can adjust them to achieve desired properties of gradient descent algorithms. The concept of generalized entropies and associated deformed logarithms provide deeper insight into novel gradient descent updates. In literature, there exist nowadays over fifty mathematically well-defined entropic functionals and associated deformed logarithms, so impossible to investigate all of them in one research paper. Therefore, we focus here on a wide-class of trace-form entropies and associated generalized logarithm. We applied the developed algorithms for Online Portfolio Selection (OPLS) in order to improve its performance and robustness.
Related papers
- Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration [67.12978375116599]
We show that the steps of gradient descent (GD) reduce to those of generalized perceptron algorithms.<n>This helps explain the optimization dynamics and the implicit acceleration phenomenon observed in neural networks.
arXiv Detail & Related papers (2025-12-12T14:16:35Z) - Improved Stochastic Optimization of LogSumExp [2.8547553943343797]
We propose a novel approximation to LogSumExp that can be efficiently optimized using gradient methods.<n>The accuracy of the approximation is controlled by a tunable parameter and can be made arbitrarily small.<n> Experiments in DRO and continuous optimal transport demonstrate the advantages of our approach.
arXiv Detail & Related papers (2025-09-29T15:03:55Z) - Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms [14.572732893433825]
We develop a wide class Mirror Descent (MD) algorithms, which play a key role in machine learning.<n>We exploit the Bregman divergence with the Tempesta multi-parametric deformation logarithm as a link function.<n>We generate a new wide and flexible family of MD and mirror-less MD updates.
arXiv Detail & Related papers (2025-06-08T17:48:44Z) - Mirror Descent and Novel Exponentiated Gradient Algorithms Using Trace-Form Entropies and Deformed Logarithms [14.283977131819285]
We propose and investigate a class of Mirror Descent updates (MD) and associated novel Generalized Exponentiated Gradient (GEG) algorithms.
The proposed algorithms can be considered as extension of entropic MD and generalization of multiplicative updates.
arXiv Detail & Related papers (2025-03-11T10:50:07Z) - Regularized dynamical parametric approximation of stiff evolution problems [0.0]
We study a class of nonlinear parametrizations $ u(t) = Phi(theta(t)) $, where the evolving parameters $theta(t)$ are to be computed.<n>The primary focus is on tackling the challenges posed by the combination of stiff evolution problems and irregular parametrizations.
arXiv Detail & Related papers (2025-01-21T13:29:36Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - On the connections between optimization algorithms, Lyapunov functions, and differential equations: theory and insights [0.0]
We revisit the framework introduced by Fazylab et al. to construct Lyapunov functions for optimization algorithms in discrete and continuous time.
For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction.
We prove convergence rates that improve on those available in the literature.
arXiv Detail & Related papers (2023-05-15T14:03:16Z) - A Constrained BA Algorithm for Rate-Distortion and Distortion-Rate
Functions [13.570794979535934]
modification of Blahut-Arimoto (BA) algorithm for rate-distortion functions.
modified algorithm directly computes the RD function for a given target distortion.
arXiv Detail & Related papers (2023-05-04T08:41:03Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Non-perturbative analytical diagonalization of Hamiltonians with
application to coupling suppression and enhancement in cQED [0.0]
Deriving effective Hamiltonian models plays an essential role in quantum theory.
We present two symbolic methods for computing effective Hamiltonian models.
We study the ZZ and cross-resonance interactions of superconducting qubits systems.
arXiv Detail & Related papers (2021-11-30T19:01:44Z) - SigGPDE: Scaling Sparse Gaussian Processes on Sequential Data [16.463077353773603]
We develop SigGPDE, a new scalable sparse variational inference framework for Gaussian Processes (GPs) on sequential data.
We show that the gradients of the GP signature kernel are solutions of a hyperbolic partial differential equation (PDE)
This theoretical insight allows us to build an efficient back-propagation algorithm to optimize the ELBO.
arXiv Detail & Related papers (2021-05-10T09:10:17Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - The data-driven physical-based equations discovery using evolutionary
approach [77.34726150561087]
We describe the algorithm for the mathematical equations discovery from the given observations data.
The algorithm combines genetic programming with the sparse regression.
It could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery.
arXiv Detail & Related papers (2020-04-03T17:21:57Z) - Efficient improper learning for online logistic regression [68.8204255655161]
It is known that any proper algorithm which has logarithmic regret in the number of samples (denoted n) necessarily suffers an exponential multiplicative constant in B.
In this work, we design an efficient improper algorithm that avoids this exponential constant while preserving a logarithmic regret.
Our new algorithm based on regularized empirical risk minimization with surrogate losses satisfies a regret scaling as O(B log(Bn)) with a per-round time-complexity of order O(d2)
arXiv Detail & Related papers (2020-03-18T09:16:14Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.