A Principle of Least Action for the Training of Neural Networks
- URL: http://arxiv.org/abs/2009.08372v4
- Date: Tue, 15 Jun 2021 09:42:46 GMT
- Title: A Principle of Least Action for the Training of Neural Networks
- Authors: Skander Karkar, Ibrahim Ayed, Emmanuel de B\'ezenac, Patrick Gallinari
- Abstract summary: We show the presence of a low kinetic energy displacement bias in the transport map of the network, and link this bias with generalization performance.
We propose a new learning algorithm, which automatically adapts to the complexity of the given task, and leads to networks with a high generalization ability even in low data regimes.
- Score: 10.342408668490975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks have been achieving high generalization performance on many
tasks despite being highly over-parameterized. Since classical statistical
learning theory struggles to explain this behavior, much effort has recently
been focused on uncovering the mechanisms behind it, in the hope of developing
a more adequate theoretical framework and having a better control over the
trained models. In this work, we adopt an alternate perspective, viewing the
neural network as a dynamical system displacing input particles over time. We
conduct a series of experiments and, by analyzing the network's behavior
through its displacements, we show the presence of a low kinetic energy
displacement bias in the transport map of the network, and link this bias with
generalization performance. From this observation, we reformulate the learning
problem as follows: finding neural networks which solve the task while
transporting the data as efficiently as possible. This offers a novel
formulation of the learning problem which allows us to provide regularity
results for the solution network, based on Optimal Transport theory. From a
practical viewpoint, this allows us to propose a new learning algorithm, which
automatically adapts to the complexity of the given task, and leads to networks
with a high generalization ability even in low data regimes.
Related papers
- Peer-to-Peer Learning Dynamics of Wide Neural Networks [10.179711440042123]
We provide an explicit, non-asymptotic characterization of the learning dynamics of wide neural networks trained using popularDGD algorithms.
We validate our analytical results by accurately predicting error and error and for classification tasks.
arXiv Detail & Related papers (2024-09-23T17:57:58Z) - Dynamical stability and chaos in artificial neural network trajectories along training [3.379574469735166]
We study the dynamical properties of this process by analyzing through this lens the network trajectories of a shallow neural network.
We find hints of regular and chaotic behavior depending on the learning rate regime.
This work also contributes to the cross-fertilization of ideas between dynamical systems theory, network theory and machine learning.
arXiv Detail & Related papers (2024-04-08T17:33:11Z) - Adaptive recurrent vision performs zero-shot computation scaling to
unseen difficulty levels [6.053394076324473]
We investigate whether adaptive computation can also enable vision models to extrapolate solutions beyond their training distribution's difficulty level.
We combine convolutional recurrent neural networks (ConvRNNs) with a learnable mechanism based on Graves: PathFinder and Mazes.
We show that AdRNNs learn to dynamically halt processing early (or late) to solve easier (or harder) problems, 2) these RNNs zero-shot generalize to more difficult problem settings not shown during training by dynamically increasing the number of recurrent at test time.
arXiv Detail & Related papers (2023-11-12T21:07:04Z) - Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - An analytic theory of shallow networks dynamics for hinge loss
classification [14.323962459195771]
We study the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task.
We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss.
This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting.
arXiv Detail & Related papers (2020-06-19T16:25:29Z) - Optimizing Neural Networks via Koopman Operator Theory [6.09170287691728]
Koopman operator theory was recently shown to be intimately connected with neural network theory.
In this work we take the first steps in making use of this connection.
We show that Koopman operator theory methods allow predictions of weights and biases of feed weights over a non-trivial range of training time.
arXiv Detail & Related papers (2020-06-03T16:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.