Non-Euclidean Broximal Point Method: A Blueprint for Geometry-Aware Optimization
- URL: http://arxiv.org/abs/2510.00823v1
- Date: Wed, 01 Oct 2025 12:32:52 GMT
- Title: Non-Euclidean Broximal Point Method: A Blueprint for Geometry-Aware Optimization
- Authors: Kaja Gruntkowska, Peter Richtárik,
- Abstract summary: Broximal Point Method (BPM) offers an idealized optimization framework based on iteratively minimizing the objective function over norm balls centered at the current iterate.<n>It enjoys striking global convergence guarantees, converging linearly and in a finite number of steps for proper, closed and convex functions.<n>In this note, we ask whether the convergence theory of BPM can be extended to this more general, non-Euclidean setting.
- Score: 55.002497070656624
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The recently proposed Broximal Point Method (BPM) [Gruntkowska et al., 2025] offers an idealized optimization framework based on iteratively minimizing the objective function over norm balls centered at the current iterate. It enjoys striking global convergence guarantees, converging linearly and in a finite number of steps for proper, closed and convex functions. However, its theoretical analysis has so far been confined to the Euclidean geometry. At the same time, emerging trends in deep learning optimization, exemplified by algorithms such as Muon [Jordan et al., 2024] and Scion [Pethick et al., 2025], demonstrate the practical advantages of minimizing over balls defined via non-Euclidean norms which better align with the underlying geometry of the associated loss landscapes. In this note, we ask whether the convergence theory of BPM can be extended to this more general, non-Euclidean setting. We give a positive answer, showing that most of the elegant guarantees of the original method carry over to arbitrary norm geometries. Along the way, we clarify which properties are preserved and which necessarily break down when leaving the Euclidean realm. Our analysis positions Non-Euclidean BPM as a conceptual blueprint for understanding a broad class of geometry-aware optimization algorithms, shedding light on the principles behind their practical effectiveness.
Related papers
- Orthogonalized Policy Optimization:Decoupling Sampling Geometry from Optimization Geometry in RLHF [0.0]
Large language model alignment objectives are often presented as a collection of distinct algorithms, such as PPO, DPO, IPO, and their variants.<n>In this work, we argue that this diversity obscures a simpler underlying structure.<n>We show that this entanglement is not merely a modeling convenience but a source of systematic instability.
arXiv Detail & Related papers (2026-01-18T13:57:44Z) - Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods [50.070182958880146]
We propose a unified framework generalizing descent, quasi-Newton methods, and adaptive methods through the novel notion of preconditioned matrix norms.<n>Within this framework, we provide the first systematic treatment of affine and scale invariance in the matrix- parameterized setting.<n>We introduce two new methods, $ttMuAdam$ and $texttMuAdam-SANIA$, which combine the spectral geometry of Muon with Adam-style preconditioning.
arXiv Detail & Related papers (2025-10-12T19:39:41Z) - A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training [8.57419236859437]
This work introduces a pioneering Banach--Bregman framework for optimization.<n>It establishes Bregman geometry as a foundation for next-generation optimization.<n> Empirical studies across machine learning, deep learning, reinforcement learning, and large language models show up to 20% faster convergence.
arXiv Detail & Related papers (2025-09-17T17:50:59Z) - Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness [51.302674884611335]
This work introduces a hybrid non-Euclidean optimization method which generalizes norm clipping by combining steepest descent and conditional gradient approaches.<n>We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.
arXiv Detail & Related papers (2025-06-02T17:34:29Z) - Early-Stopped Mirror Descent for Linear Regression over Convex Bodies [14.30754799752932]
We study the setting of high-dimensional linear regression under additive Gaussian noise.<n>We show that the worst-case risk of unconstrained early-stopped mirror descent with an appropriate potential is at most that of the least squares estimator constrained to the convex body.
arXiv Detail & Related papers (2025-03-05T11:59:31Z) - Mirror Descent Under Generalized Smoothness [23.5387392871236]
We introduce a new $ell*$-smoothness concept that measures the norm of Hessians in terms of a general norm and its dual.<n>We establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness.
arXiv Detail & Related papers (2025-02-02T11:23:10Z) - Warped geometric information on the optimisation of Euclidean functions [43.43598316339732]
We consider optimisation of a real-valued function defined in a potentially high-dimensional Euclidean space.
We find the function's optimum along a manifold with a warped metric.
Our proposed algorithm, using 3rd-order approximation of geodesics, tends to outperform standard Euclidean gradient-based counterparts.
arXiv Detail & Related papers (2023-08-16T12:08:50Z) - Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs [113.8752163061151]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs)<n>We propose the underlineperiodically underlinerestarted underlineoptimistic underlinepolicy underlineoptimization algorithm (PROPO)<n>PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement.
arXiv Detail & Related papers (2021-10-18T02:33:20Z) - Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.