Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization
- URL: http://arxiv.org/abs/2508.20344v1
- Date: Thu, 28 Aug 2025 01:36:42 GMT
- Title: Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization
- Authors: Hancheng Min, René Vidal,
- Abstract summary: gradient flow learns a target matrix by sequentially learning its singular values in decreasing order of magnitude over time.<n>We show that incremental learning emerges from some time-scale separation among dynamics corresponding to learning different components in the target matrix.
- Score: 44.278609916888065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many theoretical studies on neural networks attribute their excellent empirical performance to the implicit bias or regularization induced by first-order optimization algorithms when training networks under certain initialization assumptions. One example is the incremental learning phenomenon in gradient flow (GF) on an overparamerterized matrix factorization problem with small initialization: GF learns a target matrix by sequentially learning its singular values in decreasing order of magnitude over time. In this paper, we develop a quantitative understanding of this incremental learning behavior for GF on the symmetric matrix factorization problem, using its closed-form solution obtained by solving a Riccati-like matrix differential equation. We show that incremental learning emerges from some time-scale separation among dynamics corresponding to learning different components in the target matrix. By decreasing the initialization scale, these time-scale separations become more prominent, allowing one to find low-rank approximations of the target matrix. Lastly, we discuss the possible avenues for extending this analysis to asymmetric matrix factorization problems.
Related papers
- Back to Square Roots: An Optimal Bound on the Matrix Factorization Error for Multi-Epoch Differentially Private SGD [21.92418810749819]
We introduce a new explicit factorization method, Banded Inverse Square Root (BISR), which imposes a banded structure on the inverse correlation matrix.<n>BISR achieves anally optimal error by matching the upper and lower bounds.
arXiv Detail & Related papers (2025-05-17T19:41:44Z) - Implicit Regularization Makes Overparameterized Asymmetric Matrix Sensing Robust to Perturbations [0.3465040588448529]
We find that factorized gradient descent is highly robust to certain perturbations.<n>We find that not only is this equivalent formulation easier to work with, but it leads to sharper sample and time complexities.
arXiv Detail & Related papers (2023-09-04T20:23:35Z) - The Decimation Scheme for Symmetric Matrix Factorization [0.0]
Matrix factorization is an inference problem that has acquired importance due to its vast range of applications.
We study this extensive rank problem, extending the alternative 'decimation' procedure that we recently introduced.
We introduce a simple algorithm based on a ground state search that implements decimation and performs matrix factorization.
arXiv Detail & Related papers (2023-07-31T10:53:45Z) - Automatic Hyperparameter Tuning in Sparse Matrix Factorization [0.0]
We propose a novel numerical method of hyperparameter tuning by evaluating the zero point of normalization factor in sparse matrix prior.
Our method shows excellent performance for ground-truth sparse matrix reconstruction by comparing it with the widely-used algorithm of sparse principal component analysis.
arXiv Detail & Related papers (2023-05-17T10:40:17Z) - A Second-Order Majorant Algorithm for Nonnegative Matrix Factorization [2.646309221150203]
We introduce a general second-order optimization framework for NMF under both quadratic and $beta$-divergence loss functions.<n>Second-Order Majorant (SOM) constructs a local quadratic majorization of the loss function by majorizing its Hessian matrix.<n>We show that mSOM consistently outperforms state-of-the-art algorithms across multiple loss functions.
arXiv Detail & Related papers (2023-03-31T12:09:36Z) - Understanding Incremental Learning of Gradient Descent: A Fine-grained
Analysis of Matrix Sensing [74.2952487120137]
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models.
This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
arXiv Detail & Related papers (2023-01-27T02:30:51Z) - Unitary Approximate Message Passing for Matrix Factorization [90.84906091118084]
We consider matrix factorization (MF) with certain constraints, which finds wide applications in various areas.
We develop a Bayesian approach to MF with an efficient message passing implementation, called UAMPMF.
We show that UAMPMF significantly outperforms state-of-the-art algorithms in terms of recovery accuracy, robustness and computational complexity.
arXiv Detail & Related papers (2022-07-31T12:09:32Z) - Learning a Compressive Sensing Matrix with Structural Constraints via
Maximum Mean Discrepancy Optimization [17.104994036477308]
We introduce a learning-based algorithm to obtain a measurement matrix for compressive sensing related recovery problems.
Recent success of such metrics in neural network related topics motivate a solution of the problem based on machine learning.
arXiv Detail & Related papers (2021-10-14T08:35:54Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Robust Low-rank Matrix Completion via an Alternating Manifold Proximal
Gradient Continuation Method [47.80060761046752]
Robust low-rank matrix completion (RMC) has been studied extensively for computer vision, signal processing and machine learning applications.
This problem aims to decompose a partially observed matrix into the superposition of a low-rank matrix and a sparse matrix, where the sparse matrix captures the grossly corrupted entries of the matrix.
A widely used approach to tackle RMC is to consider a convex formulation, which minimizes the nuclear norm of the low-rank matrix (to promote low-rankness) and the l1 norm of the sparse matrix (to promote sparsity).
In this paper, motivated by some recent works on low-
arXiv Detail & Related papers (2020-08-18T04:46:22Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.