Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
- URL: http://arxiv.org/abs/2501.16322v1
- Date: Mon, 27 Jan 2025 18:56:22 GMT
- Title: Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
- Authors: Yikun Hou, Suvrit Sra, Alp Yurtsever,
- Abstract summary: Gradient descent for matrix factorization is known to exhibit an implicit bias toward approximately low-rank solutions.
We introduce a new factorization model: $Xapprox UDVtop$, where $U$ and $V$ are constrained within norm balls, while $D$ is a diagonal factor allowing the model to span the entire search space.
- Score: 36.53793044674861
- License:
- Abstract: Gradient descent for matrix factorization is known to exhibit an implicit bias toward approximately low-rank solutions. While existing theories often assume the boundedness of iterates, empirically the bias persists even with unbounded sequences. We thus hypothesize that implicit bias is driven by divergent dynamics markedly different from the convergent dynamics for data fitting. Using this perspective, we introduce a new factorization model: $X\approx UDV^\top$, where $U$ and $V$ are constrained within norm balls, while $D$ is a diagonal factor allowing the model to span the entire search space. Our experiments reveal that this model exhibits a strong implicit bias regardless of initialization and step size, yielding truly (rather than approximately) low-rank solutions. Furthermore, drawing parallels between matrix factorization and neural networks, we propose a novel neural network model featuring constrained layers and diagonal components. This model achieves strong performance across various regression and classification tasks while finding low-rank solutions, resulting in efficient and lightweight networks.
Related papers
- Subspace-Constrained Quadratic Matrix Factorization: Algorithm and Applications [1.689629482101155]
We present a subspace-constrained quadratic matrix factorization model to address challenges in manifold learning.
The model is designed to jointly learn key low-dimensional structures, including the tangent space, the normal subspace, and the quadratic form.
Results demonstrate that our model outperforms existing methods, highlighting its robustness and efficacy in capturing core low-dimensional structures.
arXiv Detail & Related papers (2024-11-07T13:57:53Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Layered Models can "Automatically" Regularize and Discover Low-Dimensional Structures via Feature Learning [6.109362130047454]
We study a two-layer nonparametric regression model where the input undergoes a linear transformation followed by a nonlinear mapping to predict the output.
We show that the two-layer model can "automatically" induce regularization and facilitate feature learning.
arXiv Detail & Related papers (2023-10-18T06:15:35Z) - Deep Neural Networks for Nonparametric Interaction Models with Diverging
Dimension [6.939768185086753]
We analyze a $kth$ order nonparametric interaction model in both growing dimension scenarios ($d$ grows with $n$ but at a slower rate) and in high dimension ($d gtrsim n$)
We show that under certain standard assumptions, debiased deep neural networks achieve a minimax optimal rate both in terms of $(n, d)$.
arXiv Detail & Related papers (2023-02-12T04:19:39Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Intrinsic dimensionality and generalization properties of the
$\mathcal{R}$-norm inductive bias [4.37441734515066]
The $mathcalR$-norm is the basis of an inductive bias for two-layer neural networks.
We find that these interpolants are intrinsically multivariate functions, even when there are ridge functions that fit the data.
arXiv Detail & Related papers (2022-06-10T18:33:15Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective.
We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities.
We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.