Understanding Incremental Learning of Gradient Descent: A Fine-grained
Analysis of Matrix Sensing
- URL: http://arxiv.org/abs/2301.11500v1
- Date: Fri, 27 Jan 2023 02:30:51 GMT
- Title: Understanding Incremental Learning of Gradient Descent: A Fine-grained
Analysis of Matrix Sensing
- Authors: Jikai Jin and Zhiyuan Li and Kaifeng Lyu and Simon S. Du and Jason D.
Lee
- Abstract summary: It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models.
This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
- Score: 74.2952487120137
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: It is believed that Gradient Descent (GD) induces an implicit bias towards
good generalization in training machine learning models. This paper provides a
fine-grained analysis of the dynamics of GD for the matrix sensing problem,
whose goal is to recover a low-rank ground-truth matrix from near-isotropic
linear measurements. It is shown that GD with small initialization behaves
similarly to the greedy low-rank learning heuristics (Li et al., 2020) and
follows an incremental learning procedure (Gissin et al., 2019): GD
sequentially learns solutions with increasing ranks until it recovers the
ground truth matrix. Compared to existing works which only analyze the first
learning phase for rank-1 solutions, our result provides characterizations for
the whole learning process. Moreover, besides the over-parameterized regime
that many prior works focused on, our analysis of the incremental learning
procedure also applies to the under-parameterized regime. Finally, we conduct
numerical experiments to confirm our theoretical findings.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm [56.06235614890066]
Gradient descent (GD) and gradient descent (SGD) have been widely used in a number of application domains.
This paper carefully analyzes the dynamics of GD based on the terminal attractor at different stages of its gradient flow.
arXiv Detail & Related papers (2024-09-10T14:15:56Z) - Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems.
We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - Understanding Forgetting in Continual Learning with Linear Regression [21.8755265936716]
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently.
We provide a general theoretical analysis of forgetting in the linear regression model via Gradient Descent.
We demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data covariance matrices are trained later, tends to result in increased forgetting.
arXiv Detail & Related papers (2024-05-27T18:33:37Z) - Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators [13.343937277604892]
We study the problem of using meta-learning to deal with uncertainty and heterogeneity in ergodic linear quadratic regulators.
We propose an algorithm that omits the estimation of policy Hessian, which applies to tasks of learning a set of heterogeneous but similar linear dynamic systems.
We provide a convergence result for the exact gradient descent process by analyzing the boundedness and smoothness of the gradient for the meta-objective.
arXiv Detail & Related papers (2024-05-27T17:26:36Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.