Related papers: A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training

A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training

URL: http://arxiv.org/abs/2509.14216v1
Date: Wed, 17 Sep 2025 17:50:59 GMT
Title: A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
Authors: Johnny R. Zhang, Xiaomei Mi, Gaoyuan Du, Qianyi Sun, Shiqi Wang, Jiaxuan Li, Wenhua Zhou,
Abstract summary: This work introduces a pioneering Banach--Bregman framework for optimization.<n>It establishes Bregman geometry as a foundation for next-generation optimization.<n> Empirical studies across machine learning, deep learning, reinforcement learning, and large language models show up to 20% faster convergence.
Score: 8.57419236859437
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Stochastic optimization powers the scalability of modern artificial intelligence, spanning machine learning, deep learning, reinforcement learning, and large language model training. Yet, existing theory remains largely confined to Hilbert spaces, relying on inner-product frameworks and orthogonality. This paradigm fails to capture non-Euclidean settings, such as mirror descent on simplices, Bregman proximal methods for sparse learning, natural gradient descent in information geometry, or Kullback--Leibler-regularized language model training. Unlike Euclidean-based Hilbert-space methods, this approach embraces general Banach spaces. This work introduces a pioneering Banach--Bregman framework for stochastic iterations, establishing Bregman geometry as a foundation for next-generation optimization. It (i) provides a unified template via Bregman projections and Bregman--Fejer monotonicity, encompassing stochastic approximation, mirror descent, natural gradient, adaptive methods, and mirror-prox; (ii) establishes super-relaxations ($\lambda > 2$) in non-Hilbert settings, enabling flexible geometries and elucidating their acceleration effect; and (iii) delivers convergence theorems spanning almost-sure boundedness to geometric rates, validated on synthetic and real-world tasks. Empirical studies across machine learning (UCI benchmarks), deep learning (e.g., Transformer training), reinforcement learning (actor--critic), and large language models (WikiText-2 with distilGPT-2) show up to 20% faster convergence, reduced variance, and enhanced accuracy over classical baselines. These results position Banach--Bregman geometry as a cornerstone unifying optimization theory and practice across core AI paradigms.

Related papers

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles [74.32932832937618]
We introduce $textbfRigidSSL$ ($textitRigidity-Aware Self-Supervised Learning$), a geometric pretraining framework.<n>Phase I (RigidSSL-Perturb) learns geometric priors from 432K structures from the AlphaFold Protein Structure Database with simulated perturbations.<n>Phase II (RigidSSL-MD) refines these representations on 1.3K molecular dynamics trajectories to capture physically realistic transitions.
arXiv Detail & Related papers (2026-03-02T21:32:30Z)
Primal: A Unified Deterministic Framework for Quasi-Orthogonal Hashing and Manifold Learning [0.0]
We present Primal, a deterministic framework that harnesses the number-theoretic independence of prime square roots to construct robust, tunable vector representations.<n>Our method exploits the Besic property to create irrational frequency modulations that guarantee non-repeating phase trajectories.<n> Empirical evaluations demonstrate that our framework yields superior retention and distribution tightness compared to random matrix projections.
arXiv Detail & Related papers (2025-11-25T20:44:34Z)
Learning Geometry: A Framework for Building Adaptive Manifold Models through Metric Optimization [8.201374511929538]
This paper proposes a novel paradigm for machine learning that moves beyond traditional parameter optimization.<n>We optimize the metric tensor field on a manifold with a predefined topology, thereby dynamically shaping the geometric structure of the model space.<n>This work lays a solid foundation for constructing fully dynamic "meta-learners" capable of autonomously evolving their geometry and topology.
arXiv Detail & Related papers (2025-10-30T01:53:32Z)
Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods [50.070182958880146]
We propose a unified framework generalizing descent, quasi-Newton methods, and adaptive methods through the novel notion of preconditioned matrix norms.<n>Within this framework, we provide the first systematic treatment of affine and scale invariance in the matrix- parameterized setting.<n>We introduce two new methods, $ttMuAdam$ and $texttMuAdam-SANIA$, which combine the spectral geometry of Muon with Adam-style preconditioning.
arXiv Detail & Related papers (2025-10-12T19:39:41Z)
Non-Euclidean Broximal Point Method: A Blueprint for Geometry-Aware Optimization [55.002497070656624]
Broximal Point Method (BPM) offers an idealized optimization framework based on iteratively minimizing the objective function over norm balls centered at the current iterate.<n>It enjoys striking global convergence guarantees, converging linearly and in a finite number of steps for proper, closed and convex functions.<n>In this note, we ask whether the convergence theory of BPM can be extended to this more general, non-Euclidean setting.
arXiv Detail & Related papers (2025-10-01T12:32:52Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
Geometrically Inspired Kernel Machines for Collaborative Learning Beyond Gradient Descent [36.59087823764832]
This paper develops a novel mathematical framework for collaborative learning by means of geometrically inspired kernel machines. For classification problems, this approach allows us to learn bounded geometric structures around given data points.
arXiv Detail & Related papers (2024-07-05T08:20:27Z)
Randomized Geometric Algebra Methods for Convex Neural Networks [45.318490912354825]
We introduce randomized algorithms to Clifford's Geometric Algebra, generalizing randomized linear algebra to hypercomplex vector spaces. This novel approach has many implications in machine learning, including training neural networks to global optimality via convex optimization.
arXiv Detail & Related papers (2024-06-04T22:22:39Z)
A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks. We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition. Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z)
Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems. Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z)
Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL) New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z)
Stochastic Flows and Geometric Optimization on the Orthogonal Group [52.50121190744979]
We present a new class of geometrically-driven optimization algorithms on the orthogonal group $O(d)$. We show that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, flows and metric learning.
arXiv Detail & Related papers (2020-03-30T15:37:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.