Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets
- URL: http://arxiv.org/abs/2210.14064v3
- Date: Thu, 23 Mar 2023 15:45:41 GMT
- Title: Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets
- Authors: Edo Cohen-Karlik, Itamar Menuhin-Gruman, Raja Giryes, Nadav Cohen and
Amir Globerson
- Abstract summary: We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
- Score: 57.06026574261203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Overparameterization in deep learning typically refers to settings where a
trained neural network (NN) has representational capacity to fit the training
data in many ways, some of which generalize well, while others do not. In the
case of Recurrent Neural Networks (RNNs), there exists an additional layer of
overparameterization, in the sense that a model may exhibit many solutions that
generalize well for sequence lengths seen in training, some of which
extrapolate to longer sequences, while others do not. Numerous works have
studied the tendency of Gradient Descent (GD) to fit overparameterized NNs with
solutions that generalize well. On the other hand, its tendency to fit
overparameterized RNNs with solutions that extrapolate has been discovered only
recently and is far less understood. In this paper, we analyze the
extrapolation properties of GD when applied to overparameterized linear RNNs.
In contrast to recent arguments suggesting an implicit bias towards short-term
memory, we provide theoretical evidence for learning low-dimensional state
spaces, which can also model long-term memory. Our result relies on a dynamical
characterization which shows that GD (with small step size and near-zero
initialization) strives to maintain a certain form of balancedness, as well as
on tools developed in the context of the moment problem from statistics
(recovery of a probability distribution from its moments). Experiments
corroborate our theory, demonstrating extrapolation via learning
low-dimensional state spaces with both linear and non-linear RNNs.
Related papers
- Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization [3.3998740964877463]
"Local linear recovery" (LLR) is a weaker form of target function recovery.
We prove that functions expressible by narrower DNNs are guaranteed to be recoverable from fewer samples than model parameters.
arXiv Detail & Related papers (2024-06-26T03:08:24Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Designing Universal Causal Deep Learning Models: The Case of
Infinite-Dimensional Dynamical Systems from Stochastic Analysis [3.5450828190071655]
Causal operators (COs) play a central role in contemporary analysis.
There is still no canonical framework for designing Deep Learning (DL) models capable of approximating COs.
This paper proposes a "geometry-aware" solution to this open problem by introducing a DL model-design framework.
arXiv Detail & Related papers (2022-10-24T14:43:03Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Understanding Why Neural Networks Generalize Well Through GSNR of
Parameters [11.208337921488207]
We study gradient signal to noise ratio (GSNR) of parameters during training process of deep neural networks (DNNs)
We show that larger GSNR during training process leads to better generalization performance.
arXiv Detail & Related papers (2020-01-21T08:33:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.