Symmetries in the dynamics of wide two-layer neural networks
- URL: http://arxiv.org/abs/2211.08771v1
- Date: Wed, 16 Nov 2022 08:59:26 GMT
- Title: Symmetries in the dynamics of wide two-layer neural networks
- Authors: Karl Hajjar (LMO, CELESTE), Lenaic Chizat (EPFL)
- Abstract summary: We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias)
We first describe a general class of symmetries which, when satisfied by the target function $f*$ and the input distribution, are preserved by the dynamics.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the idealized setting of gradient flow on the population risk for
infinitely wide two-layer ReLU neural networks (without bias), and study the
effect of symmetries on the learned parameters and predictors. We first
describe a general class of symmetries which, when satisfied by the target
function $f^*$ and the input distribution, are preserved by the dynamics. We
then study more specific cases. When $f^*$ is odd, we show that the dynamics of
the predictor reduces to that of a (non-linearly parameterized) linear
predictor, and its exponential convergence can be guaranteed. When $f^*$ has a
low-dimensional structure, we prove that the gradient flow PDE reduces to a
lower-dimensional PDE. Furthermore, we present informal and numerical arguments
that suggest that the input neurons align with the lower-dimensional structure
of the problem.
Related papers
- Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression [4.983567824636051]
This study combines nonparametric estimation and parametric sparse deep neural networks for the first time.
As nonparametric estimation of partial derivatives is of great significance for nonlinear variable selection, the current results show the promising future for the interpretability of deep neural networks.
arXiv Detail & Related papers (2024-06-26T07:41:41Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Lie Point Symmetry and Physics Informed Networks [59.56218517113066]
We propose a loss function that informs the network about Lie point symmetries in the same way that PINN models try to enforce the underlying PDE through a loss function.
Our symmetry loss ensures that the infinitesimal generators of the Lie group conserve the PDE solutions.
Empirical evaluations indicate that the inductive bias introduced by the Lie point symmetries of the PDEs greatly boosts the sample efficiency of PINNs.
arXiv Detail & Related papers (2023-11-07T19:07:16Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Learning Physics-Informed Neural Networks without Stacked
Back-propagation [82.26566759276105]
We develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks.
In particular, we parameterize the PDE solution by the Gaussian smoothed model and show that, derived from Stein's Identity, the second-order derivatives can be efficiently calculated without back-propagation.
Experimental results show that our proposed method can achieve competitive error compared to standard PINN training but is two orders of magnitude faster.
arXiv Detail & Related papers (2022-02-18T18:07:54Z) - Single Trajectory Nonparametric Learning of Nonlinear Dynamics [8.438421942654292]
Given a single trajectory of a dynamical system, we analyze the performance of the nonparametric least squares estimator (LSE)
We leverage recently developed information-theoretic methods to establish the optimality of the LSE for non hypotheses classes.
We specialize our results to a number of scenarios of practical interest, such as Lipschitz dynamics, generalized linear models, and dynamics described by functions in certain classes of Reproducing Kernel Hilbert Spaces (RKHS)
arXiv Detail & Related papers (2022-02-16T19:38:54Z) - Implicit Bias of MSE Gradient Optimization in Underparameterized Neural
Networks [0.0]
We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow.
We show that the network learns eigenfunctions of an integral operator $T_Kinfty$ determined by the Neural Tangent Kernel (NTK)
We conclude that damped deviations offers a simple and unifying perspective of the dynamics when optimizing the squared error.
arXiv Detail & Related papers (2022-01-12T23:28:41Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Inference on the Change Point for High Dimensional Dynamic Graphical
Models [9.74000189600846]
We develop an estimator for the change point parameter for a dynamically evolving graphical model.
It retains sufficient adaptivity against plug-in estimates of the graphical model parameters.
It is illustrated on RNA-sequenced data and their changes between young and older individuals.
arXiv Detail & Related papers (2020-05-19T19:15:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.