State-space Models with Layer-wise Nonlinearity are Universal
Approximators with Exponential Decaying Memory
- URL: http://arxiv.org/abs/2309.13414v3
- Date: Wed, 1 Nov 2023 11:35:26 GMT
- Title: State-space Models with Layer-wise Nonlinearity are Universal
Approximators with Exponential Decaying Memory
- Authors: Shida Wang, Beichen Xue
- Abstract summary: We show that stacking state-space models with layer-wise nonlinear activation is sufficient to approximate any continuous sequence-to-sequence relationship.
Our findings demonstrate that the addition of layer-wise nonlinear activation enhances the model's capacity to learn complex sequence patterns.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-space models have gained popularity in sequence modelling due to their
simple and efficient network structures. However, the absence of nonlinear
activation along the temporal direction limits the model's capacity. In this
paper, we prove that stacking state-space models with layer-wise nonlinear
activation is sufficient to approximate any continuous sequence-to-sequence
relationship. Our findings demonstrate that the addition of layer-wise
nonlinear activation enhances the model's capacity to learn complex sequence
patterns. Meanwhile, it can be seen both theoretically and empirically that the
state-space models do not fundamentally resolve the issue of exponential
decaying memory. Theoretical results are justified by numerical verifications.
Related papers
- Data-driven Nonlinear Model Reduction using Koopman Theory: Integrated
Control Form and NMPC Case Study [56.283944756315066]
We propose generic model structures combining delay-coordinate encoding of measurements and full-state decoding to integrate reduced Koopman modeling and state estimation.
A case study demonstrates that our approach provides accurate control models and enables real-time capable nonlinear model predictive control of a high-purity cryogenic distillation column.
arXiv Detail & Related papers (2024-01-09T11:54:54Z) - Layered Models can "Automatically" Regularize and Discover Low-Dimensional Structures via Feature Learning [6.109362130047454]
We study a two-layer nonparametric regression model where the input undergoes a linear transformation followed by a nonlinear mapping to predict the output.
We show that the two-layer model can "automatically" induce regularization and facilitate feature learning.
arXiv Detail & Related papers (2023-10-18T06:15:35Z) - Neural Abstractions [72.42530499990028]
We present a novel method for the safety verification of nonlinear dynamical models that uses neural networks to represent abstractions of their dynamics.
We demonstrate that our approach performs comparably to the mature tool Flow* on existing benchmark nonlinear models.
arXiv Detail & Related papers (2023-01-27T12:38:09Z) - Dynamical chaos in nonlinear Schr\"odinger models with subquadratic
power nonlinearity [137.6408511310322]
We deal with a class of nonlinear Schr"odinger lattices with random potential and subquadratic power nonlinearity.
We show that the spreading process is subdiffusive and has complex microscopic organization.
The limit of quadratic power nonlinearity is also discussed and shown to result in a delocalization border.
arXiv Detail & Related papers (2023-01-20T16:45:36Z) - Log-linear Guardedness and its Implications [116.87322784046926]
Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful.
This work formally defines the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation.
We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept.
arXiv Detail & Related papers (2022-10-18T17:30:02Z) - Learning Reduced Nonlinear State-Space Models: an Output-Error Based
Canonical Approach [8.029702645528412]
We investigate the effectiveness of deep learning in the modeling of dynamic systems with nonlinear behavior.
We show its ability to identify three different nonlinear systems.
The performances are evaluated in terms of open-loop prediction on test data generated in simulation as well as a real world data-set of unmanned aerial vehicle flight measurements.
arXiv Detail & Related papers (2022-04-19T06:33:23Z) - Nonlinear proper orthogonal decomposition for convection-dominated flows [0.0]
We propose an end-to-end Galerkin-free model combining autoencoders with long short-term memory networks for dynamics.
Our approach not only improves the accuracy, but also significantly reduces the computational cost of training and testing.
arXiv Detail & Related papers (2021-10-15T18:05:34Z) - On the Memory Mechanism of Tensor-Power Recurrent Models [25.83531612758211]
We investigate the memory mechanism of TP recurrent models.
We show that a large degree p is an essential condition to achieve the long memory effect.
New model is expected to benefit from the long memory effect in a stable manner.
arXiv Detail & Related papers (2021-03-02T07:07:47Z) - Hessian Eigenspectra of More Realistic Nonlinear Models [73.31363313577941]
We make a emphprecise characterization of the Hessian eigenspectra for a broad family of nonlinear models.
Our analysis takes a step forward to identify the origin of many striking features observed in more complex machine learning models.
arXiv Detail & Related papers (2021-03-02T06:59:52Z) - Non-parametric Models for Non-negative Functions [48.7576911714538]
We provide the first model for non-negative functions from the same good linear models.
We prove that it admits a representer theorem and provide an efficient dual formulation for convex problems.
arXiv Detail & Related papers (2020-07-08T07:17:28Z) - Hidden Markov Nonlinear ICA: Unsupervised Learning from Nonstationary
Time Series [0.0]
We show how to combine nonlinear Independent Component Analysis with a Hidden Markov Model.
We prove identifiability of the proposed model for a general mixing nonlinearity, such as a neural network.
We achieve a new nonlinear ICA framework which is unsupervised, more efficient, as well as able to model underlying temporal dynamics.
arXiv Detail & Related papers (2020-06-22T10:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.