Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
- URL: http://arxiv.org/abs/2603.04703v1
- Date: Thu, 05 Mar 2026 00:54:19 GMT
- Title: Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
- Authors: Baekrok Shin, Chulhee Yun,
- Abstract summary: We identify coupled dynamics as a key mechanism behind the implicit low-rank bias observed in deeper networks.<n>We show that deep models avoid plasticity loss due to their low-rank bias, whereas depth-2 networks pre-trained under decoupled dynamics fail to converge to low-rank, even when resumed training.
- Score: 26.883288932823163
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study matrix completion via deep matrix factorization (a.k.a. deep linear neural networks) as a simplified testbed to examine how network depth influences training dynamics. Despite the simplicity and importance of the problem, prior theory largely focuses on shallow (depth-2) models and does not fully explain the implicit low-rank bias observed in deeper networks. We identify coupled dynamics as a key mechanism behind this bias and show that it intensifies with increasing depth. Focusing on gradient flow under block-diagonal observations, we prove: (a) networks of depth $\geq 3$ exhibit coupling unless initialized diagonally, and (b) convergence to rank-1 occurs if and only if the dynamics is coupled -- resolving an open question by Menon (2024) for a family of initializations. We also revisit the loss of plasticity phenomenon in matrix completion (Kleinman et al., 2024), where pre-training on few observations and resuming with more degrades performance. We show that deep models avoid plasticity loss due to their low-rank bias, whereas depth-2 networks pre-trained under decoupled dynamics fail to converge to low-rank, even when resumed training (with additional data) satisfies the coupling condition -- shedding light on the mechanism behind this phenomenon.
Related papers
- Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data [52.737775129027575]
We show that gradient flow on a two-layer ReLU network for classifying orthogonally separable data provably exhibits Neural Collapse (NC)<n>We reveal the role of the implicit bias of the training dynamics in facilitating the emergence of NC.
arXiv Detail & Related papers (2025-10-24T01:36:19Z) - Provable Emergence of Deep Neural Collapse and Low-Rank Bias in $L^2$-Regularized Nonlinear Networks [8.271417588922336]
We show a link between deep neural collapse and the emergence of low-rank weight matrices for a general class of feedforward networks with nonlinear activation.<n>We also prove the global optimality of deep neural collapsed configurations and the practical absence of a loss barrier between interpolating minima and globally optimal points.
arXiv Detail & Related papers (2024-02-06T13:44:39Z) - Efficient Compression of Overparameterized Deep Models through
Low-Dimensional Learning Dynamics [10.673414267895355]
We present a novel approach for compressing over parameterized models.
Our algorithm improves the training efficiency by more than 2x, without compromising generalization.
arXiv Detail & Related papers (2023-11-08T23:57:03Z) - The Law of Parsimony in Gradient Descent for Learning Deep Linear
Networks [34.85235641812005]
We reveal a surprising "law of parsimony" in the learning dynamics when the data possesses low-dimensional structures.
This simplicity in learning dynamics could have significant implications for both efficient training and a better understanding of deep networks.
arXiv Detail & Related papers (2023-06-01T21:24:53Z) - A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix
Factorization [21.64166573203593]
Implicit regularization is an important way to interpret neural networks.
Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF)
arXiv Detail & Related papers (2022-12-29T02:11:19Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Training invariances and the low-rank phenomenon: beyond linear networks [44.02161831977037]
We show that when one trains a deep linear network with logistic or exponential loss on linearly separable data, the weights converge to rank-$1$ matrices.
This is the first time a low-rank phenomenon is proven rigorously for nonlinear ReLU-activated feedforward networks.
Our proof relies on a specific decomposition of the network into a multilinear function and another ReLU network whose weights are constant under a certain parameter directional convergence.
arXiv Detail & Related papers (2022-01-28T07:31:19Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - The Low-Rank Simplicity Bias in Deep Networks [46.79964271742486]
We make a series of empirical observations that investigate and extend the hypothesis that deep networks are inductively biased to find solutions with lower effective rank embeddings.
We show that our claim holds true on finite width linear and non-linear models on practical learning paradigms and show that on natural data, these are often the solutions that generalize well.
arXiv Detail & Related papers (2021-03-18T17:58:02Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.