Empirical Analysis of the Inductive Bias of Recurrent Neural Networks by
Discrete Fourier Transform of Output Sequences
- URL: http://arxiv.org/abs/2305.09178v1
- Date: Tue, 16 May 2023 05:30:13 GMT
- Title: Empirical Analysis of the Inductive Bias of Recurrent Neural Networks by
Discrete Fourier Transform of Output Sequences
- Authors: Taiga Ishii, Ryo Ueda, Yusuke Miyao
- Abstract summary: This research aims to uncover the inherent generalization properties, i.e., inductive bias, of Recurrent Neural Networks (RNNs)
Experimental results showed that Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) have an inductive bias towards lower-frequency patterns.
We also found that the inductive bias of LSTM and GRU varies with the number of layers and the size of hidden layers.
- Score: 7.279215553861787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A unique feature of Recurrent Neural Networks (RNNs) is that it incrementally
processes input sequences. In this research, we aim to uncover the inherent
generalization properties, i.e., inductive bias, of RNNs with respect to how
frequently RNNs switch the outputs through time steps in the sequence
classification task, which we call output sequence frequency. Previous work
analyzed inductive bias by training models with a few synthetic data and
comparing the model's generalization with candidate generalization patterns.
However, when examining the output sequence frequency, previous methods cannot
be directly applied since enumerating candidate patterns is computationally
difficult for longer sequences. To this end, we propose to directly calculate
the output sequence frequency for each model by regarding the outputs of the
model as discrete-time signals and applying frequency domain analysis.
Experimental results showed that Long Short-Term Memory (LSTM) and Gated
Recurrent Unit (GRU) have an inductive bias towards lower-frequency patterns,
while Elman RNN tends to learn patterns in which the output changes at high
frequencies. We also found that the inductive bias of LSTM and GRU varies with
the number of layers and the size of hidden layers.
Related papers
- Generalization of Graph Neural Networks is Robust to Model Mismatch [84.01980526069075]
Graph neural networks (GNNs) have demonstrated their effectiveness in various tasks supported by their generalization capabilities.
In this paper, we examine GNNs that operate on geometric graphs generated from manifold models.
Our analysis reveals the robustness of the GNN generalization in the presence of such model mismatch.
arXiv Detail & Related papers (2024-08-25T16:00:44Z) - Inference of Sequential Patterns for Neural Message Passing in Temporal Graphs [0.6562256987706128]
HYPA-DBGNN is a novel two-step approach that combines the inference of anomalous sequential patterns in time series data on graphs.
Our method leverages hypergeometric graph ensembles to identify anomalous edges within both first- and higher-order De Bruijn graphs.
Our work is the first to introduce statistically informed GNNs that leverage temporal and causal sequence anomalies.
arXiv Detail & Related papers (2024-06-24T11:41:12Z) - Hierarchically Gated Recurrent Neural Network for Sequence Modeling [36.14544998133578]
We propose a gated linear RNN model dubbed Hierarchically Gated Recurrent Neural Network (HGRN)
Experiments on language modeling, image classification, and long-range arena benchmarks showcase the efficiency and effectiveness of our proposed model.
arXiv Detail & Related papers (2023-11-08T16:50:05Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Continuous Depth Recurrent Neural Differential Equations [0.0]
We propose continuous depth recurrent neural differential equations (CDR-NDE) to generalize RNN models.
CDR-NDE considers two separate differential equations over each of these dimensions and models the evolution in the temporal and depth directions.
We also propose the CDR-NDE-heat model based on partial differential equations which treats the computation of hidden states as solving a heat equation over time.
arXiv Detail & Related papers (2022-12-28T06:34:32Z) - Modeling Irregular Time Series with Continuous Recurrent Units [3.7335080869292483]
We propose continuous recurrent units (CRUs) to handle irregular time intervals between observations.
We show that CRU can better interpolate irregular time series than neural ordinary differential equation (neural ODE)-based models.
We also show that our model can infer dynamics from im-ages and that the Kalman gain efficiently singles out candidates for valuable state updates from noisy observations.
arXiv Detail & Related papers (2021-11-22T16:49:15Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - Consistency of mechanistic causal discovery in continuous-time using
Neural ODEs [85.7910042199734]
We consider causal discovery in continuous-time for the study of dynamical systems.
We propose a causal discovery algorithm based on penalized Neural ODEs.
arXiv Detail & Related papers (2021-05-06T08:48:02Z) - On the exact computation of linear frequency principle dynamics and its
generalization [6.380166265263755]
Recent works show an intriguing phenomenon of Frequency Principle (F-Principle) that fits the target function from low to high frequency during the training.
In this paper, we derive the exact differential equation, namely Linear Frequency-Principle (LFP) model, governing the evolution of NN output function in frequency domain.
arXiv Detail & Related papers (2020-10-15T15:17:21Z) - Graph Gamma Process Generalized Linear Dynamical Systems [60.467040479276704]
We introduce graph gamma process (GGP) linear dynamical systems to model real multivariate time series.
For temporal pattern discovery, the latent representation under the model is used to decompose the time series into a parsimonious set of multivariate sub-sequences.
We use the generated random graph, whose number of nonzero-degree nodes is finite, to define both the sparsity pattern and dimension of the latent state transition matrix.
arXiv Detail & Related papers (2020-07-25T04:16:34Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.