Related papers: Edge of stability echo state networks

Edge of stability echo state networks

URL: http://arxiv.org/abs/2308.02902v2
Date: Sun, 3 Sep 2023 05:53:02 GMT
Title: Edge of stability echo state networks
Authors: Andrea Ceni, Claudio Gallicchio
Abstract summary: Echo State Networks (ESNs) are time-series processing models working under the Echo State Property (ESP) principle. We introduce a new ESN architecture, called the Edge of Stability Echo State Network (ES$2$N)
Score: 5.888495030452654
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Echo State Networks (ESNs) are time-series processing models working under the Echo State Property (ESP) principle. The ESP is a notion of stability that imposes an asymptotic fading of the memory of the input. On the other hand, the resulting inherent architectural bias of ESNs may lead to an excessive loss of information, which in turn harms the performance in certain tasks with long short-term memory requirements. With the goal of bringing together the fading memory property and the ability to retain as much memory as possible, in this paper we introduce a new ESN architecture, called the Edge of Stability Echo State Network (ES$^2$N). The introduced ES$^2$N model is based on defining the reservoir layer as a convex combination of a nonlinear reservoir (as in the standard ESN), and a linear reservoir that implements an orthogonal transformation. We provide a thorough mathematical analysis of the introduced model, proving that the whole eigenspectrum of the Jacobian of the ES$^2$N map can be contained in an annular neighbourhood of a complex circle of controllable radius, and exploit this property to demonstrate that the ES$^2$N's forward dynamics evolves close to the edge-of-chaos regime by design. Remarkably, our experimental analysis shows that the newly introduced reservoir model is able to reach the theoretical maximum short-term memory capacity. At the same time, in comparison to standard ESN, ES$^2$N is shown to offer an excellent trade-off between memory and nonlinearity, as well as a significant improvement of performance in autoregressive nonlinear modeling.

Related papers

Echo State network for coarsening dynamics of charge density waves [0.0]
echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer. Here we build an ESN to model the coarsening dynamics of charge-density waves (CDW) in a semi-classical Holstein model.
arXiv Detail & Related papers (2024-12-16T17:04:10Z)
Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data. We train the model using maximum likelihood estimation with Markov chain Monte Carlo. Experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z)
Recurrent Stochastic Configuration Networks for Temporal Data Analytics [3.8719670789415925]
This paper develops a recurrent version of configuration networks (RSCNs) for problem solving. We build an initial RSCN model in the light of a supervisory mechanism, followed by an online update of the output weights. Numerical results clearly indicate that the proposed RSCN performs favourably over all of the datasets.
arXiv Detail & Related papers (2024-06-21T03:21:22Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis [16.893624100273108]
We analyze recurrent neural networks with diagonal hidden-to-hidden weight matrices trained with gradient descent in the supervised learning setting. We prove that gradient descent can achieve optimality emphwithout massive over parameterization. Our results are based on an explicit characterization of the class of dynamical systems that can be approximated and learned by recurrent neural networks.
arXiv Detail & Related papers (2024-02-19T15:56:43Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
StEik: Stabilizing the Optimization of Neural Signed Distance Functions and Finer Shape Representation [12.564019188842861]
We show analytically that as the representation power of the network increases, the optimization approaches a partial differential equation (PDE) in the continuum limit that is unstable. We show that this instability can manifest in existing network optimization, leading to irregularities in the reconstructed surface and/or convergence to sub-optimal local minima. We introduce a new regularization term that still counteracts the eikonal instability but without over-regularizing.
arXiv Detail & Related papers (2023-05-28T18:27:15Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Second-order regression models exhibit progressive sharpening to the edge of stability [30.92413051155244]
We show that for quadratic objectives in two dimensions, a second-order regression model exhibits progressive sharpening towards a value that differs slightly from the edge of stability. In higher dimensions, the model generically shows similar behavior, even without the specific structure of a neural network.
arXiv Detail & Related papers (2022-10-10T17:21:20Z)
Euler State Networks: Non-dissipative Reservoir Computing [3.55810827129032]
We propose a novel Reservoir Computing (RC) model, called the Euler State Network (EuSN) Our mathematical analysis shows that the resulting model is biased towards a unitary effective spectral radius and zero local Lyapunov exponents, intrinsically operating near to the edge of stability. Results on time-series classification benchmarks indicate that EuSN is able to match (or even exceed) the accuracy of trainable Recurrent Neural Networks.
arXiv Detail & Related papers (2022-03-17T15:18:34Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Lipschitz Recurrent Neural Networks [100.72827570987992]
We show that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs. Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks.
arXiv Detail & Related papers (2020-06-22T08:44:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.