Mirror Descent and Novel Exponentiated Gradient Algorithms Using Trace-Form Entropies and Deformed Logarithms
- URL: http://arxiv.org/abs/2503.08748v4
- Date: Tue, 28 Oct 2025 15:01:16 GMT
- Title: Mirror Descent and Novel Exponentiated Gradient Algorithms Using Trace-Form Entropies and Deformed Logarithms
- Authors: Andrzej Cichocki, Toshihisa Tanaka, Frank Nielsen, Sergio Cruces,
- Abstract summary: This paper introduces a broad class of Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) algorithms.<n>Leveraging these generalized entropies yields MD & GEG algorithms with improved convergence behavior, robustness to vanishing and exploding gradients, and inherent adaptability to non-Euclidean geometries.
- Score: 17.422938130292827
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper introduces a broad class of Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) algorithms derived from trace-form entropies defined via deformed logarithms. Leveraging these generalized entropies yields MD \& GEG algorithms with improved convergence behavior, robustness to vanishing and exploding gradients, and inherent adaptability to non-Euclidean geometries through mirror maps. We establish deep connections between these methods and Amari's natural gradient, revealing a unified geometric foundation for additive, multiplicative, and natural gradient updates. Focusing on the Tsallis, Kaniadakis, Sharma--Taneja--Mittal, and Kaniadakis--Lissia--Scarfone entropy families, we show that each entropy induces a distinct Riemannian metric on the parameter space, leading to GEG algorithms that preserve the natural statistical geometry. The tunable parameters of deformed logarithms enable adaptive geometric selection, providing enhanced robustness and convergence over classical Euclidean optimization. Overall, our framework unifies key first-order MD optimization methods under a single information-geometric perspective based on generalized Bregman divergences, where the choice of entropy determines the underlying metric and dual geometric structure.
Related papers
- Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms [14.572732893433825]
We develop a wide class Mirror Descent (MD) algorithms, which play a key role in machine learning.<n>We exploit the Bregman divergence with the Tempesta multi-parametric deformation logarithm as a link function.<n>We generate a new wide and flexible family of MD and mirror-less MD updates.
arXiv Detail & Related papers (2025-06-08T17:48:44Z) - NeuralGrok: Accelerate Grokking by Neural Gradient Transformation [54.65707216563953]
We propose NeuralGrok, a gradient-based approach that learns an optimal gradient transformation to accelerate generalization of transformers in arithmetic tasks.
Our experiments demonstrate that NeuralGrok significantly accelerates generalization, particularly in challenging arithmetic tasks.
We also show that NeuralGrok promotes a more stable training paradigm, constantly reducing the model's complexity.
arXiv Detail & Related papers (2025-04-24T04:41:35Z) - Covariant Gradient Descent [0.0]
We present a manifestly covariant formulation of the gradient descent method, ensuring consistency across arbitrary coordinate systems and general curved trainable spaces.<n>We show that commonly used optimization methods such as RMSProp, Adam and AdaBelief correspond to special limits of the covariant gradient descent.
arXiv Detail & Related papers (2025-04-07T17:25:50Z) - Generalized Exponentiated Gradient Algorithms Using the Euler Two-Parameter Logarithm [14.572732893433825]
We propose and investigate a new class of Generalized Exponentiated Gradient (GEG) algorithms using Mirror Descent (MD) approaches.<n>The concept of generalized entropies and associated deformed logarithms provide deeper insight into novel gradient descent updates.
arXiv Detail & Related papers (2025-02-21T11:05:04Z) - Variational Combinatorial Sequential Monte Carlo for Bayesian Phylogenetics in Hyperbolic Space [2.596075116490744]
We develop novel hyperbolic extensions of two sequential search algorithms.<n>Our approach introduces consistent and unbiased estimators, along with variational inference methods.<n> Empirical results demonstrate improved speed, scalability and performance in high-dimensional phylogenetic inference tasks.
arXiv Detail & Related papers (2025-01-29T20:02:16Z) - Regularized dynamical parametric approximation of stiff evolution problems [0.0]
We study a class of nonlinear parametrizations $ u(t) = Phi(theta(t)) $, where the evolving parameters $theta(t)$ are to be computed.<n>The primary focus is on tackling the challenges posed by the combination of stiff evolution problems and irregular parametrizations.
arXiv Detail & Related papers (2025-01-21T13:29:36Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Advective Diffusion Transformers for Topological Generalization in Graph
Learning [69.2894350228753]
We show how graph diffusion equations extrapolate and generalize in the presence of varying graph topologies.
We propose a novel graph encoder backbone, Advective Diffusion Transformer (ADiT), inspired by advective graph diffusion equations.
arXiv Detail & Related papers (2023-10-10T08:40:47Z) - Implicit Balancing and Regularization: Generalization and Convergence
Guarantees for Overparameterized Asymmetric Matrix Sensing [28.77440901439686]
A series of recent papers have begun to generalize this role for non-random Positive Semi-Defin (PSD) matrix sensing problems.
In this paper, we show that the trajectory of the gradient descent from small random measurements moves towards solutions that are both globally well.
arXiv Detail & Related papers (2023-03-24T19:05:52Z) - Deep neural networks on diffeomorphism groups for optimal shape
reparameterization [44.99833362998488]
We propose an algorithm for constructing approximations of orientation-preserving diffeomorphisms by composition of elementary diffeomorphisms.
The algorithm is implemented using PyTorch, and is applicable for both unparametrized curves and surfaces.
arXiv Detail & Related papers (2022-07-22T15:25:59Z) - The Dynamics of Riemannian Robbins-Monro Algorithms [101.29301565229265]
We propose a family of Riemannian algorithms generalizing and extending the seminal approximation framework of Robbins and Monro.
Compared to their Euclidean counterparts, Riemannian algorithms are much less understood due to lack of a global linear structure on the manifold.
We provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes.
arXiv Detail & Related papers (2022-06-14T12:30:11Z) - Characterization of variational quantum algorithms using free fermions [0.0]
We numerically study the interplay between these symmetries and the locality of the target state.
We find that the number of iterations to converge to the solution scales linearly with system size.
arXiv Detail & Related papers (2022-06-13T18:11:16Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Generalized Optimization: A First Step Towards Category Theoretic
Learning Theory [1.52292571922932]
We generalize several optimization algorithms, including a straightforward generalization of gradient descent and a novel generalization of Newton's method.
We show that the transformation invariances of these algorithms are preserved.
We also show that we can express the change in loss of generalized descent with an inner product-like expression.
arXiv Detail & Related papers (2021-09-20T15:19:06Z) - Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient
adaptive algorithms for neural networks [0.0]
We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive vanishing algorithms.
In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named TH$varepsilon$O POULA (or, simply, TheoPouLa)
arXiv Detail & Related papers (2021-05-28T15:58:48Z) - ResNet-LDDMM: Advancing the LDDMM Framework Using Deep Residual Networks [86.37110868126548]
In this work, we make use of deep residual neural networks to solve the non-stationary ODE (flow equation) based on a Euler's discretization scheme.
We illustrate these ideas on diverse registration problems of 3D shapes under complex topology-preserving transformations.
arXiv Detail & Related papers (2021-02-16T04:07:13Z) - Efficient Variational Bayesian Structure Learning of Dynamic Graphical
Models [19.591265962713837]
Estimating time-varying graphical models is of paramount importance in various social, financial, biological, and engineering systems.
Existing methods require extensive tuning of parameters that control the graph sparsity and temporal smoothness.
We propose a low-complexity tuning-free Bayesian approach, named BADGE.
arXiv Detail & Related papers (2020-09-16T14:19:23Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - The data-driven physical-based equations discovery using evolutionary
approach [77.34726150561087]
We describe the algorithm for the mathematical equations discovery from the given observations data.
The algorithm combines genetic programming with the sparse regression.
It could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery.
arXiv Detail & Related papers (2020-04-03T17:21:57Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.