UNSO: Unified Newton Schulz Orthogonalization
- URL: http://arxiv.org/abs/2602.02500v1
- Date: Sun, 18 Jan 2026 17:54:43 GMT
- Title: UNSO: Unified Newton Schulz Orthogonalization
- Authors: Chen Hu, Qianxi Zhao, Yuming Li, Mingyu Zhou, Xiyin Li,
- Abstract summary: The Newton-Schulz (NS) has gained increasing interest for its role in the Muon and Stiefel manifold.<n>We consolidate the iterative structure into a unified framework, named Unified Newton-Schulz Orthogonalization (UNSO)<n>These learnable coefficients are then optimized, and achieve an outstanding performance with stable convergence.
- Score: 10.113387496783007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Newton-Schulz (NS) iteration has gained increasing interest for its role in the Muon optimizer and the Stiefel manifold. However, the conventional NS iteration suffers from inefficiency and instability. Although various improvements have been introduced to NS iteration, they fail to deviate from the conventional iterative paradigm, which could increase computation burden largely due to the matrix products along the long dimension repeatedly. To address this, we consolidate the iterative structure into a unified framework, named Unified Newton-Schulz Orthogonalization (UNSO). To do so, we could avoid a polynomial expansion. Instead, we evaluate the role of each matrix power, remove the insignificant terms, and provide a recommended polynomial with learnable coefficients. These learnable coefficients are then optimized, and achieve an outstanding performance with stable convergence. The code of our method is available: https://github.com/greekinRoma/Unified_Newton_Schulz_Orthogonalization.
Related papers
- Turbo-Muon: Accelerating Orthogonality-Based Optimization with Pre-Conditioning [7.966927192439667]
We introduce a preconditioning procedure that accelerates Newton-Schulz convergence and reduces its computational cost.<n>Our publicly available implementation achieves up to a 2.8x speedup in the Newton-Schulz approximation.
arXiv Detail & Related papers (2025-12-04T10:06:22Z) - AuON: A Linear-time Alternative to Semi-Orthogonal Momentum Updates [0.0]
We study the semi-orthogonal properties of momentum-based updates and develop a method to bound momentum updates under a spectral-norm trust region.<n>We propose AuON (Alternative Unit-norm momentum updates by Normalized nonlinear scaling), a linear-time that achieves strong performance without constructing semi-orthogonal matrices.<n>Our approach combines hyperbolic-cosine RMS scaling transformations with normalization, demonstrating both effectiveness and computational efficiency compared to Newton-Schulz methods.
arXiv Detail & Related papers (2025-09-29T06:03:53Z) - Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods [37.1630298053787]
We propose a new framework, which we call the helper framework.<n>It provides a unified view of the variance and second-order algorithms equipped with global complexity guarantees.
arXiv Detail & Related papers (2023-02-23T12:18:28Z) - Second-order optimization with lazy Hessians [55.51077907483634]
We analyze Newton's lazy Hessian updates for solving general possibly non-linear optimization problems.
We reuse a previously seen Hessian iteration while computing new gradients at each step of the method.
arXiv Detail & Related papers (2022-12-01T18:58:26Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Matrix Completion via Non-Convex Relaxation and Adaptive Correlation
Learning [90.8576971748142]
We develop a novel surrogate that can be optimized by closed-form solutions.
We exploit upperwise correlation for completion, and thus an adaptive correlation learning model.
arXiv Detail & Related papers (2022-03-04T08:50:50Z) - DASHA: Distributed Nonconvex Optimization with Communication
Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems.
Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - Self-supervised Symmetric Nonnegative Matrix Factorization [82.59905231819685]
Symmetric nonnegative factor matrix (SNMF) has demonstrated to be a powerful method for data clustering.
Inspired by ensemble clustering that aims to seek better clustering results, we propose self-supervised SNMF (S$3$NMF)
We take advantage of the sensitivity to code characteristic of SNMF, without relying on any additional information.
arXiv Detail & Related papers (2021-03-02T12:47:40Z) - Convergence to Second-Order Stationarity for Non-negative Matrix
Factorization: Provably and Concurrently [18.89597524771988]
Non-negative matrix factorization (NMF) is a fundamental non-modification optimization problem with numerous applications in Machine Learning.
This paper defines a multiplicative weight update type dynamics (Seung algorithm) that runs concurrently and provably avoids saddle points.
An important advantage is the use concurrent implementations in parallel computing environments.
arXiv Detail & Related papers (2020-02-26T06:40:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.