Understanding Incremental Learning of Gradient Descent: A Fine-grained
Analysis of Matrix Sensing
- URL: http://arxiv.org/abs/2301.11500v1
- Date: Fri, 27 Jan 2023 02:30:51 GMT
- Title: Understanding Incremental Learning of Gradient Descent: A Fine-grained
Analysis of Matrix Sensing
- Authors: Jikai Jin and Zhiyuan Li and Kaifeng Lyu and Simon S. Du and Jason D.
Lee
- Abstract summary: It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models.
This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
- Score: 74.2952487120137
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: It is believed that Gradient Descent (GD) induces an implicit bias towards
good generalization in training machine learning models. This paper provides a
fine-grained analysis of the dynamics of GD for the matrix sensing problem,
whose goal is to recover a low-rank ground-truth matrix from near-isotropic
linear measurements. It is shown that GD with small initialization behaves
similarly to the greedy low-rank learning heuristics (Li et al., 2020) and
follows an incremental learning procedure (Gissin et al., 2019): GD
sequentially learns solutions with increasing ranks until it recovers the
ground truth matrix. Compared to existing works which only analyze the first
learning phase for rank-1 solutions, our result provides characterizations for
the whole learning process. Moreover, besides the over-parameterized regime
that many prior works focused on, our analysis of the incremental learning
procedure also applies to the under-parameterized regime. Finally, we conduct
numerical experiments to confirm our theoretical findings.
Related papers
- Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [4.313136216120379]
In this paper, we provide convergence analysis for the implicit gradient descent for training over-parametrized two-layer PINNs.
We show that the randomly IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - Understanding Forgetting in Continual Learning with Linear Regression [21.8755265936716]
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently.
We provide a general theoretical analysis of forgetting in the linear regression model via Gradient Descent.
We demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data covariance matrices are trained later, tends to result in increased forgetting.
arXiv Detail & Related papers (2024-05-27T18:33:37Z) - Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators [13.343937277604892]
We study the problem of using meta-learning to deal with uncertainty and heterogeneity in ergodic linear quadratic regulators.
We propose an algorithm that omits the estimation of policy Hessian, which applies to tasks of learning a set of heterogeneous but similar linear dynamic systems.
We provide a convergence result for the exact gradient descent process by analyzing the boundedness and smoothness of the gradient for the meta-objective.
arXiv Detail & Related papers (2024-05-27T17:26:36Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Provable Generalization of Overparameterized Meta-learning Trained with
SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML)
We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds.
Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.