On the Crucial Role of Initialization for Matrix Factorization
- URL: http://arxiv.org/abs/2410.18965v1
- Date: Thu, 24 Oct 2024 17:58:21 GMT
- Title: On the Crucial Role of Initialization for Matrix Factorization
- Authors: Bingcong Li, Liang Zhang, Aryan Mokhtari, Niao He,
- Abstract summary: This work revisits the classical lowrank matrix factorization problem and unveils the critical role of initialization in shaping convergence rates.
We introduce Nystrom NyGD in both symmetric asymmetric matrix factorization tasks and extend this to low-rank adapters (LoRA)
Our approach, NoRA, demonstrates superior performance across various downstream and model scales, from 1B to 7B parameters, in large language and diffusion models.
- Score: 40.834791383134416
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This work revisits the classical low-rank matrix factorization problem and unveils the critical role of initialization in shaping convergence rates for such nonconvex and nonsmooth optimization. We introduce Nystrom initialization, which significantly improves the global convergence of Scaled Gradient Descent (ScaledGD) in both symmetric and asymmetric matrix factorization tasks. Specifically, we prove that ScaledGD with Nystrom initialization achieves quadratic convergence in cases where only linear rates were previously known. Furthermore, we extend this initialization to low-rank adapters (LoRA) commonly used for finetuning foundation models. Our approach, NoRA, i.e., LoRA with Nystrom initialization, demonstrates superior performance across various downstream tasks and model scales, from 1B to 7B parameters, in large language and diffusion models.
Related papers
- Understanding the Learning Dynamics of LoRA: A Gradient Flow Perspective on Low-Rank Adaptation in Matrix Factorization [7.940066909711888]
We analyze the learning dynamics of Low-Rank Adaptation (LoRA) for matrix factorization under gradient flow (GF)
Our analysis shows that the final error is affected by the misalignment between the singular spaces of the pre-trained model and the target matrix.
arXiv Detail & Related papers (2025-03-10T06:57:10Z) - GP-FL: Model-Based Hessian Estimation for Second-Order Over-the-Air Federated Learning [52.295563400314094]
Second-order methods are widely adopted to improve the convergence rate of learning algorithms.
This paper introduces a novel second-order FL framework tailored for wireless channels.
arXiv Detail & Related papers (2024-12-05T04:27:41Z) - Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models.
LoRA often under performs when compared to full- parameter fine-tuning.
We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z) - The Decimation Scheme for Symmetric Matrix Factorization [0.0]
Matrix factorization is an inference problem that has acquired importance due to its vast range of applications.
We study this extensive rank problem, extending the alternative 'decimation' procedure that we recently introduced.
We introduce a simple algorithm based on a ground state search that implements decimation and performs matrix factorization.
arXiv Detail & Related papers (2023-07-31T10:53:45Z) - Gradient descent in matrix factorization: Understanding large initialization [6.378022003282206]
The framework is grounded in signal-to-noise ratio concepts and inductive arguments.
The results uncover an implicit incremental learning phenomenon in GD and offer a deeper understanding of its performance in large scenarios.
arXiv Detail & Related papers (2023-05-30T16:55:34Z) - On the Explicit Role of Initialization on the Convergence and Implicit
Bias of Overparametrized Linear Networks [1.0323063834827415]
We present a novel analysis of single-hidden-layer linear networks trained under gradient flow.
We show that the squared loss converges exponentially to its optimum.
We derive a novel non-asymptotic upper-bound on the distance between the trained network and the min-norm solution.
arXiv Detail & Related papers (2021-05-13T15:13:51Z) - On the Implicit Bias of Initialization Shape: Beyond Infinitesimal
Mirror Descent [55.96478231566129]
We show that relative scales play an important role in determining the learned model.
We develop a technique for deriving the inductive bias of gradient-flow.
arXiv Detail & Related papers (2021-02-19T07:10:48Z) - Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems.
We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems.
Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z) - Renormalization for Initialization of Rolling Shutter Visual-Inertial
Odometry [5.33024001730262]
Initialization is a prerequisite for using inertial signals and fusing them with visual data.
We propose a novel statistical solution to the problem on visual and inertial data simultaneously, by casting it into the renormalization scheme of Kanatani.
Extensive evaluations on ground truth exhibit superior performance and a gain in accuracy of up to $20%$ over the originally proposed Least Squares solution.
arXiv Detail & Related papers (2020-08-14T14:54:15Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.