Framing RNN as a kernel method: A neural ODE approach
- URL: http://arxiv.org/abs/2106.01202v1
- Date: Wed, 2 Jun 2021 14:46:40 GMT
- Title: Framing RNN as a kernel method: A neural ODE approach
- Authors: Adeline Fermanian, Pierre Marion, Jean-Philippe Vert, G\'erard Biau
- Abstract summary: We show that the solution of a RNN can be viewed as a linear function of a specific feature set of the input sequence, known as the signature.
We obtain theoretical guarantees on generalization and stability for a large class of recurrent networks.
- Score: 11.374487003189468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Building on the interpretation of a recurrent neural network (RNN) as a
continuous-time neural differential equation, we show, under appropriate
conditions, that the solution of a RNN can be viewed as a linear function of a
specific feature set of the input sequence, known as the signature. This
connection allows us to frame a RNN as a kernel method in a suitable
reproducing kernel Hilbert space. As a consequence, we obtain theoretical
guarantees on generalization and stability for a large class of recurrent
networks. Our results are illustrated on simulated datasets.
Related papers
- Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Use of Parallel Explanatory Models to Enhance Transparency of Neural Network Configurations for Cell Degradation Detection [18.214293024118145]
We build a parallel model to illuminate and understand the internal operation of neural networks.
We show how each layer of the RNN transforms the input distributions to increase detection accuracy.
At the same time we also discover a side effect acting to limit the improvement in accuracy.
arXiv Detail & Related papers (2024-04-17T12:22:54Z) - Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences [0.0]
We characterize the tangents of recurrent neural networks (RNNs) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity.
These methods give rise to the neural kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.
arXiv Detail & Related papers (2023-08-28T13:17:39Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Lyapunov-Guided Representation of Recurrent Neural Network Performance [9.449520199858952]
Recurrent Neural Networks (RNN) are ubiquitous computing systems for sequences and time series data.
We propose to treat RNN as dynamical systems and to correlate hyperparameters with accuracy through Lyapunov spectral analysis.
Our studies of various RNN architectures show that AeLLE successfully correlates RNN Lyapunov spectrum with accuracy.
arXiv Detail & Related papers (2022-04-11T05:38:38Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and
(gradient) stable architecture for learning long time dependencies [15.2292571922932]
We propose a novel architecture for recurrent neural networks.
Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations.
Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks.
arXiv Detail & Related papers (2020-10-02T12:35:04Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - Understanding Recurrent Neural Networks Using Nonequilibrium Response
Theory [5.33024001730262]
Recurrent neural networks (RNNs) are brain-inspired models widely used in machine learning for analyzing sequential data.
We show how RNNs process input signals using the response theory from nonequilibrium statistical mechanics.
arXiv Detail & Related papers (2020-06-19T10:09:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.