On the Memory Mechanism of Tensor-Power Recurrent Models
- URL: http://arxiv.org/abs/2103.01521v1
- Date: Tue, 2 Mar 2021 07:07:47 GMT
- Title: On the Memory Mechanism of Tensor-Power Recurrent Models
- Authors: Hejia Qiu, Chao Li, Ying Weng, Zhun Sun, Xingyu He, Qibin Zhao
- Abstract summary: We investigate the memory mechanism of TP recurrent models.
We show that a large degree p is an essential condition to achieve the long memory effect.
New model is expected to benefit from the long memory effect in a stable manner.
- Score: 25.83531612758211
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Tensor-power (TP) recurrent model is a family of non-linear dynamical
systems, of which the recurrence relation consists of a p-fold (a.k.a.,
degree-p) tensor product. Despite such the model frequently appears in the
advanced recurrent neural networks (RNNs), to this date there is limited study
on its memory property, a critical characteristic in sequence tasks. In this
work, we conduct a thorough investigation of the memory mechanism of TP
recurrent models. Theoretically, we prove that a large degree p is an essential
condition to achieve the long memory effect, yet it would lead to unstable
dynamical behaviors. Empirically, we tackle this issue by extending the degree
p from discrete to a differentiable domain, such that it is efficiently
learnable from a variety of datasets. Taken together, the new model is expected
to benefit from the long memory effect in a stable manner. We experimentally
show that the proposed model achieves competitive performance compared to
various advanced RNNs in both the single-cell and seq2seq architectures.
Related papers
- Dense ReLU Neural Networks for Temporal-spatial Model [13.8173644075917]
We focus on fully connected deep neural networks utilizing the Rectified Linear Unit (ReLU) activation function for nonparametric estimation.
We derive non-asymptotic bounds that lead to convergence rates, addressing both temporal and spatial dependence in the observed measurements.
We also tackle the curse of dimensionality by modeling the data on a manifold, exploring the intrinsic dimensionality of high-dimensional data.
arXiv Detail & Related papers (2024-11-15T05:30:36Z) - Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data.
We train the model using maximum likelihood estimation with Markov chain Monte Carlo.
Experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z) - Neural Persistence Dynamics [8.197801260302642]
We consider the problem of learning the dynamics in the topology of time-evolving point clouds.
Our proposed model - $textitNeural Persistence Dynamics$ - substantially outperforms the state-of-the-art across a diverse set of parameter regression tasks.
arXiv Detail & Related papers (2024-05-24T17:20:18Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Residual Tensor Train: a Flexible and Efficient Approach for Learning
Multiple Multilinear Correlations [4.754987078078158]
In this paper, we present a novel Residual Train (ResTT) which integrates the merits of TT and residual structure.
In particular, we prove that the fully-connected layer in neural networks and the Volterra series can be taken as special cases of ResTT.
We prove that such a rule is much more relaxed than that of TT, which means ResTT can easily address the vanishing and exploding gradient problem.
arXiv Detail & Related papers (2021-08-19T12:47:16Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Stochastic Recurrent Neural Network for Multistep Time Series
Forecasting [0.0]
We leverage advances in deep generative models and the concept of state space models to propose an adaptation of the recurrent neural network for time series forecasting.
Our model preserves the architectural workings of a recurrent neural network for which all relevant information is encapsulated in its hidden states, and this flexibility allows our model to be easily integrated into any deep architecture for sequential modelling.
arXiv Detail & Related papers (2021-04-26T01:43:43Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.