Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
- URL: http://arxiv.org/abs/2501.16322v1
- Date: Mon, 27 Jan 2025 18:56:22 GMT
- Title: Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
- Authors: Yikun Hou, Suvrit Sra, Alp Yurtsever,
- Abstract summary: Gradient descent for matrix factorization is known to exhibit an implicit bias toward approximately low-rank solutions.<n>We introduce a new factorization model: $Xapprox UDVtop$, where $U$ and $V$ are constrained within norm balls, while $D$ is a diagonal factor allowing the model to span the entire search space.
- Score: 36.53793044674861
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient descent for matrix factorization is known to exhibit an implicit bias toward approximately low-rank solutions. While existing theories often assume the boundedness of iterates, empirically the bias persists even with unbounded sequences. We thus hypothesize that implicit bias is driven by divergent dynamics markedly different from the convergent dynamics for data fitting. Using this perspective, we introduce a new factorization model: $X\approx UDV^\top$, where $U$ and $V$ are constrained within norm balls, while $D$ is a diagonal factor allowing the model to span the entire search space. Our experiments reveal that this model exhibits a strong implicit bias regardless of initialization and step size, yielding truly (rather than approximately) low-rank solutions. Furthermore, drawing parallels between matrix factorization and neural networks, we propose a novel neural network model featuring constrained layers and diagonal components. This model achieves strong performance across various regression and classification tasks while finding low-rank solutions, resulting in efficient and lightweight networks.
Related papers
- Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback [50.89125374999765]
We provide the first convergence guarantee for Optimistic Multiplicative Weights Update ($mathtOMWU$) in NLHF.<n>Our analysis identifies a novel marginal convergence behavior, where the probability of rarely played actions grows exponentially from exponentially small values.
arXiv Detail & Related papers (2025-12-31T12:08:29Z) - Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration [67.12978375116599]
We show that the steps of gradient descent (GD) reduce to those of generalized perceptron algorithms.<n>This helps explain the optimization dynamics and the implicit acceleration phenomenon observed in neural networks.
arXiv Detail & Related papers (2025-12-12T14:16:35Z) - Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks [7.721672385781673]
We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer.<n>We apply Random Matrix Theory to derive a closed-form expression for the generalization error.<n>We then apply this analysis to recurrent representations and obtain concise formula that characterize their performance.
arXiv Detail & Related papers (2025-11-04T09:30:31Z) - Low-Rank Tensor Recovery via Variational Schatten-p Quasi-Norm and Jacobian Regularization [49.85875869048434]
We propose a CP-based low-rank tensor function parameterized by neural networks for implicit neural representation.<n>To achieve sparser CP decomposition, we introduce a variational Schatten-p quasi-norm to prune redundant rank-1 components.<n>For smoothness, we propose a regularization term based on the spectral norm of the Jacobian and Hutchinson's trace estimator.
arXiv Detail & Related papers (2025-06-27T11:23:10Z) - T-Rex: Fitting a Robust Factor Model via Expectation-Maximization [0.0]
We propose a novel expectation-maximization (EM) algorithm for robustly fitting statistical factor models.<n>Our approach is based on Tyler's M-estimator of the scatter matrix for an elliptical distribution.<n>We present numerical experiments on both synthetic and real examples.
arXiv Detail & Related papers (2025-05-17T18:53:06Z) - Subspace-Constrained Quadratic Matrix Factorization: Algorithm and Applications [1.689629482101155]
We present a subspace-constrained quadratic matrix factorization model to address challenges in manifold learning.
The model is designed to jointly learn key low-dimensional structures, including the tangent space, the normal subspace, and the quadratic form.
Results demonstrate that our model outperforms existing methods, highlighting its robustness and efficacy in capturing core low-dimensional structures.
arXiv Detail & Related papers (2024-11-07T13:57:53Z) - Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent [4.031100721019478]
We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime.
We prove the first tensor result of its kind for gradient descent rather than gradient flow.
arXiv Detail & Related papers (2024-10-21T17:52:01Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Layered Models can "Automatically" Regularize and Discover Low-Dimensional Structures via Feature Learning [6.109362130047454]
We study a two-layer nonparametric regression model where the input undergoes a linear transformation followed by a nonlinear mapping to predict the output.
We show that the two-layer model can "automatically" induce regularization and facilitate feature learning.
arXiv Detail & Related papers (2023-10-18T06:15:35Z) - Deep Neural Networks for Nonparametric Interaction Models with Diverging
Dimension [6.939768185086753]
We analyze a $kth$ order nonparametric interaction model in both growing dimension scenarios ($d$ grows with $n$ but at a slower rate) and in high dimension ($d gtrsim n$)
We show that under certain standard assumptions, debiased deep neural networks achieve a minimax optimal rate both in terms of $(n, d)$.
arXiv Detail & Related papers (2023-02-12T04:19:39Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Intrinsic dimensionality and generalization properties of the
$\mathcal{R}$-norm inductive bias [4.37441734515066]
The $mathcalR$-norm is the basis of an inductive bias for two-layer neural networks.
We find that these interpolants are intrinsically multivariate functions, even when there are ridge functions that fit the data.
arXiv Detail & Related papers (2022-06-10T18:33:15Z) - Origins of Low-dimensional Adversarial Perturbations [17.17170592140042]
We study the phenomenon of low-dimensional adversarial perturbations in classification.
The goal is to fool the classifier into flipping its decision on a nonzero fraction of inputs from a designated class.
We compute lowerbounds for the fooling rate of any subspace.
arXiv Detail & Related papers (2022-03-25T17:02:49Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective.
We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities.
We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.