From Stability to Chaos: Analyzing Gradient Descent Dynamics in
Quadratic Regression
- URL: http://arxiv.org/abs/2310.01687v1
- Date: Mon, 2 Oct 2023 22:59:17 GMT
- Title: From Stability to Chaos: Analyzing Gradient Descent Dynamics in
Quadratic Regression
- Authors: Xuxing Chen, Krishnakumar Balasubramanian, Promit Ghosal, Bhavya
Agrawalla
- Abstract summary: We investigate the dynamics of gradient descent using large-order constant step-sizes in the context of quadratic regression models.
We delineate five distinct training phases: (1) monotonic, (2) catapult, (3) periodic, (4) chaotic, and (5) divergent.
In particular, we observe that performing an ergodic trajectory averaging stabilizes the test error in non-monotonic (and non-divergent) phases.
- Score: 14.521929085104441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We conduct a comprehensive investigation into the dynamics of gradient
descent using large-order constant step-sizes in the context of quadratic
regression models. Within this framework, we reveal that the dynamics can be
encapsulated by a specific cubic map, naturally parameterized by the step-size.
Through a fine-grained bifurcation analysis concerning the step-size parameter,
we delineate five distinct training phases: (1) monotonic, (2) catapult, (3)
periodic, (4) chaotic, and (5) divergent, precisely demarcating the boundaries
of each phase. As illustrations, we provide examples involving phase retrieval
and two-layer neural networks employing quadratic activation functions and
constant outer-layers, utilizing orthogonal training data. Our simulations
indicate that these five phases also manifest with generic non-orthogonal data.
We also empirically investigate the generalization performance when training in
the various non-monotonic (and non-divergent) phases. In particular, we observe
that performing an ergodic trajectory averaging stabilizes the test error in
non-monotonic (and non-divergent) phases.
Related papers
- Cascade of phase transitions in the training of Energy-based models [9.945465034701288]
We investigate the feature encoding process in a prototypical energy-based generative model, the Bernoulli-Bernoulli RBM.
Our study tracks the evolution of the model's weight matrix through its singular value decomposition.
We validate our theoretical results by training the Bernoulli-Bernoulli RBM on real data sets.
arXiv Detail & Related papers (2024-05-23T15:25:56Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Neural network analysis of neutron and X-ray reflectivity data:
Incorporating prior knowledge for tackling the phase problem [141.5628276096321]
We present an approach that utilizes prior knowledge to regularize the training process over larger parameter spaces.
We demonstrate the effectiveness of our method in various scenarios, including multilayer structures with box model parameterization.
In contrast to previous methods, our approach scales favorably when increasing the complexity of the inverse problem.
arXiv Detail & Related papers (2023-06-28T11:15:53Z) - Latent Traversals in Generative Models as Potential Flows [113.4232528843775]
We propose to model latent structures with a learned dynamic potential landscape.
Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations.
Our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines.
arXiv Detail & Related papers (2023-04-25T15:53:45Z) - Topological correlations in three dimensional classical Ising models: an
exact solution with a continuous phase transition [8.83889166043817]
We study a 3D classical Ising model that is exactly solvable when some coupling constants take certain imaginary values.
We show that a related exactly solvable 3D classical statistical model with real coupling constants also shows the topological features of one of these phases.
arXiv Detail & Related papers (2022-02-23T04:22:30Z) - Topological transitions with continuously monitored free fermions [68.8204255655161]
We show the presence of a topological phase transition that is of a different universality class than that observed in stroboscopic projective circuits.
We find that this entanglement transition is well identified by a combination of the bipartite entanglement entropy and the topological entanglement entropy.
arXiv Detail & Related papers (2021-12-17T22:01:54Z) - The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD)
We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z) - Phases of learning dynamics in artificial neural networks: with or
without mislabeled data [3.3576886095389296]
We study dynamics of gradient descent that drives learning in neural networks.
Without mislabeled data, we find that the SGD learning dynamics transitions from a fast learning phase to a slow exploration phase.
We find that individual sample losses for the two datasets are most separated during phase II.
arXiv Detail & Related papers (2021-01-16T19:44:27Z) - Phase diagram for two-layer ReLU neural networks at infinite-width limit [6.380166265263755]
We draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit.
We identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime.
In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay.
In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations.
arXiv Detail & Related papers (2020-07-15T06:04:35Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.