Adaptive Matrix Online Learning through Smoothing with Guarantees for Nonsmooth Nonconvex Optimization
- URL: http://arxiv.org/abs/2602.08232v1
- Date: Mon, 09 Feb 2026 03:09:47 GMT
- Title: Adaptive Matrix Online Learning through Smoothing with Guarantees for Nonsmooth Nonconvex Optimization
- Authors: Ruichen Jiang, Zakaria Mhammedi, Mehryar Mohri, Aryan Mokhtari,
- Abstract summary: We study online linear optimization with matrix variables by the operatorAML, a setting where the geometry renders designing datadependent and efficient adaptive algorithms challenging.<n>We instantiate this framework with two efficient methods that avoid projections.<n>We show both methods admit closed-form updates match one-sided Shampoo's regret up to a constant factor, while significantly reducing computational cost.
- Score: 54.723834588133165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online linear optimization with matrix variables constrained by the operator norm, a setting where the geometry renders designing data-dependent and efficient adaptive algorithms challenging. The best-known adaptive regret bounds are achieved by Shampoo-like methods, but they require solving a costly quadratic projection subproblem. To address this, we extend the gradient-based prediction scheme to adaptive matrix online learning and cast algorithm design as constructing a family of smoothed potentials for the nuclear norm. We define a notion of admissibility for such smoothings and prove any admissible smoothing yields a regret bound matching the best-known guarantees of one-sided Shampoo. We instantiate this framework with two efficient methods that avoid quadratic projections. The first is an adaptive Follow-the-Perturbed-Leader (FTPL) method using Gaussian stochastic smoothing. The second is Follow-the-Augmented-Matrix-Leader (FAML), which uses a deterministic hyperbolic smoothing in an augmented matrix space. By analyzing the admissibility of these smoothings, we show both methods admit closed-form updates and match one-sided Shampoo's regret up to a constant factor, while significantly reducing computational cost. Lastly, using the online-to-nonconvex conversion, we derive two matrix-based optimizers, Pion (from FTPL) and Leon (from FAML). We prove convergence guarantees for these methods in nonsmooth nonconvex settings, a guarantee that the popular Muon optimizer lacks.
Related papers
- GPU-friendly and Linearly Convergent First-order Methods for Certifying Optimal $k$-sparse GLMs [7.079949618914198]
Branch-and-Bound (BnB) frameworks can certify optimality using perspective relaxations.<n>Existing methods for solving these relaxations are computationally intensive, limiting their scalability.<n>We develop a unified proximal framework that is both linearly convergent and computationally efficient.
arXiv Detail & Related papers (2026-03-01T22:26:09Z) - Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming [55.848340925419286]
We study online statistical inference for the solutions of quadratic optimization problems with equality and inequality constraints.<n>We develop a sequential programming (SSQP) method to solve these problems, where the step direction is computed by sequentially performing an approximation of the objective and a linear approximation of the constraints.<n>We show that our method global almost moving-average convergence and exhibits local normality with an optimal primal-dual limiting matrix in the sense of Hjek and Le Cam.
arXiv Detail & Related papers (2025-11-27T06:16:17Z) - Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling [17.9255078650875]
We develop an online inference procedure for constrained optimization using Sketched Sequential Quadratic Programming (SSQP)<n>As a direct generalization of sketched Newton methods, SSQP approximates the objective with a quadratic model and the constraints with a linear model at each step, then applies a sketching solver to in solve the resulting subproblem.<n>In particular, we construct a test statistic based on SSQP iterates whose limiting distribution is free of any unknown parameters.
arXiv Detail & Related papers (2025-05-23T19:33:08Z) - Improving Adaptive Moment Optimization via Preconditioner Diagonalization [11.01832755213396]
We show that our approach can substantially enhance the convergence speed of modern adaptives.<n>For large language models like LLaMA, we can achieve a speedup of 2x compared to the baseline Adam.
arXiv Detail & Related papers (2025-02-11T11:48:04Z) - $\ell_0$-Regularized Quadratic Surface Support Vector Machines [0.0]
Kernel-free quadratic surface support vector machines have recently gained traction due to their flexibility in modeling nonlinear decision boundaries without relying on kernel functions.<n>We propose a sparse variant of the QSVM by enforcing a cardinality constraint on the model parameters.<n>We validate our approach on several real-world datasets, demonstrating its ability to reduce overfitting while maintaining strong classification performance.
arXiv Detail & Related papers (2025-01-20T04:26:34Z) - GP-FL: Model-Based Hessian Estimation for Second-Order Over-the-Air Federated Learning [52.295563400314094]
Second-order methods are widely adopted to improve the convergence rate of learning algorithms.<n>This paper introduces a novel second-order FL framework tailored for wireless channels.
arXiv Detail & Related papers (2024-12-05T04:27:41Z) - Smooth over-parameterized solvers for non-smooth structured optimization [3.756550107432323]
Non-smoothness encodes structural constraints on the solutions, such as sparsity, group sparsity, low-rank edges and sharp edges.
We operate a non-weighted but smooth overparametrization of the underlying nonsmooth optimization problems.
Our main contribution is to apply the Variable Projection (VarPro) which defines a new formulation by explicitly minimizing over part of the variables.
arXiv Detail & Related papers (2022-05-03T09:23:07Z) - Matrix Completion via Non-Convex Relaxation and Adaptive Correlation
Learning [90.8576971748142]
We develop a novel surrogate that can be optimized by closed-form solutions.
We exploit upperwise correlation for completion, and thus an adaptive correlation learning model.
arXiv Detail & Related papers (2022-03-04T08:50:50Z) - Mixed-Projection Conic Optimization: A New Paradigm for Modeling Rank
Constraints [3.179831861897336]
We provide a framework for solving low-rank optimization problems to certifiable optimality.
Our framework also provides near-optimal solutions through rounding and local search techniques.
arXiv Detail & Related papers (2020-09-22T08:59:06Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Multi-Objective Matrix Normalization for Fine-grained Visual Recognition [153.49014114484424]
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC)
Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features.
We propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation.
arXiv Detail & Related papers (2020-03-30T08:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.