A Shooting Formulation of Deep Learning
- URL: http://arxiv.org/abs/2006.10330v2
- Date: Tue, 8 Dec 2020 15:43:57 GMT
- Title: A Shooting Formulation of Deep Learning
- Authors: Fran\c{c}ois-Xavier Vialard (ligm), Roland Kwitt, Susan Wei, Marc
- Abstract summary: We introduce a shooting formulation which shifts the perspective from parameterizing a network layer-by-layer to parameterizing over optimal networks.
For scalability, we propose a novel particle-ensemble parametrization which fully specifies the optimal weight trajectory of the continuous-depth neural network.
- Score: 19.51427735087011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continuous-depth neural networks can be viewed as deep limits of discrete
neural networks whose dynamics resemble a discretization of an ordinary
differential equation (ODE). Although important steps have been taken to
realize the advantages of such continuous formulations, most current techniques
are not truly continuous-depth as they assume \textit{identical} layers.
Indeed, existing works throw into relief the myriad difficulties presented by
an infinite-dimensional parameter space in learning a continuous-depth neural
ODE. To this end, we introduce a shooting formulation which shifts the
perspective from parameterizing a network layer-by-layer to parameterizing over
optimal networks described only by a set of initial conditions. For
scalability, we propose a novel particle-ensemble parametrization which fully
specifies the optimal weight trajectory of the continuous-depth neural network.
Our experiments show that our particle-ensemble shooting formulation can
achieve competitive performance, especially on long-range forecasting tasks.
Finally, though the current work is inspired by continuous-depth neural
networks, the particle-ensemble shooting formulation also applies to
discrete-time networks and may lead to a new fertile area of research in deep
learning parametrization.
Related papers
- Implicit regularization of deep residual networks towards neural ODEs [8.075122862553359]
We establish an implicit regularization of deep residual networks towards neural ODEs.
We prove that if the network is as a discretization of a neural ODE, then such a discretization holds throughout training.
arXiv Detail & Related papers (2023-09-03T16:35:59Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - FFNB: Forgetting-Free Neural Blocks for Deep Continual Visual Learning [14.924672048447338]
We devise a dynamic network architecture for continual learning based on a novel forgetting-free neural block (FFNB)
Training FFNB features on new tasks is achieved using a novel procedure that constrains the underlying parameters in the null-space of the previous tasks.
arXiv Detail & Related papers (2021-11-22T17:23:34Z) - Training Integrable Parameterizations of Deep Neural Networks in the
Infinite-Width Limit [0.0]
Large-width dynamics has emerged as a fruitful viewpoint and led to practical insights on real-world deep networks.
For two-layer neural networks, it has been understood that the nature of the trained model radically changes depending on the scale of the initial random weights.
We propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics.
arXiv Detail & Related papers (2021-10-29T07:53:35Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Rectified Linear Postsynaptic Potential Function for Backpropagation in
Deep Spiking Neural Networks [55.0627904986664]
Spiking Neural Networks (SNNs) usetemporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation.
This paper investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems.
arXiv Detail & Related papers (2020-03-26T11:13:07Z) - Approximation smooth and sparse functions by deep neural networks
without saturation [0.6396288020763143]
In this paper, we aim at constructing deep neural networks with three hidden layers to approximate smooth and sparse functions.
We prove that the constructed deep nets can reach the optimal approximation rate in approximating both smooth and sparse functions with controllable magnitude of free parameters.
arXiv Detail & Related papers (2020-01-13T09:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.