Related papers: Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias

Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias

URL: http://arxiv.org/abs/2505.02614v1
Date: Mon, 05 May 2025 12:33:18 GMT
Title: Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias
Authors: Yura Malitsky, Alexander Posch,
Abstract summary: This paper focuses on applying entropic mirror descent to solve linear systems.<n>The main challenge for the convergence analysis stems from the unboundedness of the domain.<n>To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes.
Score: 55.72269695392027
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper focuses on applying entropic mirror descent to solve linear systems, where the main challenge for the convergence analysis stems from the unboundedness of the domain. To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes. Along the way, we strengthen the bound for $\ell_1$-norm implicit bias, obtain sublinear and linear convergence results, and generalize the convergence result to arbitrary convex $L$-smooth functions. We also propose an alternative method that avoids exponentiation, resembling the original Hadamard descent, but with provable convergence.

Related papers

Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control: Non-Penalty Approach [3.585860184121598]
We address the challenge of tuning the penalty parameter and the risk of introducing stationary points.<n>Our results enable direct group-sparse feedback design gains without resorting to certain assumptions.
arXiv Detail & Related papers (2025-07-26T09:50:21Z)
Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent [14.19520637866741]
We establish non-asymptotic convergence rates in the central limit theorem for Polyak-Ruppert-averaged iterates of gradient descent.<n>We prove the non-asymptotic validity of the multiplier bootstrap for constructing the confidence sets for an optimization problem.
arXiv Detail & Related papers (2025-02-10T17:49:05Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Optimal Scalarizations for Sublinear Hypervolume Regret [2.703970154550017]
We show that hypervolume scalarizations with uniformly random weights achieve an optimal subvolume hypervolume regret bound of $O(T-1/k)$. For the setting of multiobjective linear bandits, we derive a novel non-Euclidean analysis to get regret bounds of $tildeO( d T-1/2 + T-1/k)$, removing unnecessary $textpoly(k)$ dependencies.
arXiv Detail & Related papers (2023-07-06T20:49:42Z)
Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions. We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z)
Globally Convergent Policy Search over Dynamic Filters for Output Estimation [64.90951294952094]
We introduce the first direct policy search algorithm convex which converges to the globally optimal $textitdynamic$ filter. We show that informativity overcomes the aforementioned degeneracy.
arXiv Detail & Related papers (2022-02-23T18:06:20Z)
Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors [23.738242876364865]
We present a PL-based analysis for a generalization of mirror descent with a possibly time-dependent mirror. Our result establishes sufficient conditions and provides learning for convergence of mirror descent rates. For functions, our analysis implies existence of an interpolating solution of GMD to this solution.
arXiv Detail & Related papers (2020-09-18T01:05:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.