Related papers: Deconstructing Recurrence, Attention, and Gating: Investigating the transferability of Transformers and Gated Recurrent Neural Networks in forecasting of dynamical systems

Deconstructing Recurrence, Attention, and Gating: Investigating the transferability of Transformers and Gated Recurrent Neural Networks in forecasting of dynamical systems

URL: http://arxiv.org/abs/2410.02654v1
Date: Thu, 3 Oct 2024 16:41:51 GMT
Title: Deconstructing Recurrence, Attention, and Gating: Investigating the transferability of Transformers and Gated Recurrent Neural Networks in forecasting of dynamical systems
Authors: Hunter Heidenreich, Pantelis R. Vlachas, etros Koumoutsakos,
Abstract summary: We decompose the key architectural components of the most powerful neural architectures, namely gating and recurrence in RNNs, and attention mechanisms in transformers. A key finding is that neural gating and attention improves the accuracy of all standard RNNs in most tasks, while the addition of a notion of recurrence in transformers is detrimental.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning architectures, including transformers and recurrent neural networks (RNNs) have revolutionized forecasting in applications ranging from text processing to extreme weather. Notably, advanced network architectures, tuned for applications such as natural language processing, are transferable to other tasks such as spatiotemporal forecasting tasks. However, there is a scarcity of ablation studies to illustrate the key components that enable this forecasting accuracy. The absence of such studies, although explainable due to the associated computational cost, intensifies the belief that these models ought to be considered as black boxes. In this work, we decompose the key architectural components of the most powerful neural architectures, namely gating and recurrence in RNNs, and attention mechanisms in transformers. Then, we synthesize and build novel hybrid architectures from the standard blocks, performing ablation studies to identify which mechanisms are effective for each task. The importance of considering these components as hyper-parameters that can augment the standard architectures is exhibited on various forecasting datasets, from the spatiotemporal chaotic dynamics of the multiscale Lorenz 96 system, the Kuramoto-Sivashinsky equation, as well as standard real world time-series benchmarks. A key finding is that neural gating and attention improves the performance of all standard RNNs in most tasks, while the addition of a notion of recurrence in transformers is detrimental. Furthermore, our study reveals that a novel, sparsely used, architecture which integrates Recurrent Highway Networks with neural gating and attention mechanisms, emerges as the best performing architecture in high-dimensional spatiotemporal forecasting of dynamical systems.

Related papers

Rodrigues Network for Learning Robot Actions [76.69283501115855]
We propose the Neural Rodrigues Operator to inject kinematics-aware inductive bias into neural computation.<n>We design the Rodrigues Network (RodriNet), a novel neural architecture specialized for processing actions.<n>Our results suggest that integrating structured kinematic priors into the network architecture improves action learning in various domains.
arXiv Detail & Related papers (2025-06-03T08:34:06Z)
A Realistic Simulation Framework for Analog/Digital Neuromorphic Architectures [73.65190161312555]
ARCANA is a spiking neural network simulator designed to account for the properties of mixed-signal neuromorphic circuits. We show how the results obtained provide a reliable estimate of the behavior of the spiking neural network trained in software.
arXiv Detail & Related papers (2024-09-23T11:16:46Z)
Systematic construction of continuous-time neural networks for linear dynamical systems [0.0]
We discuss a systematic approach to constructing neural architectures for modeling a subclass of dynamical systems. We use a variant of continuous-time neural networks in which the output of each neuron evolves continuously as a solution of a first-order or second-order Ordinary Differential Equation (ODE) Instead of deriving the network architecture and parameters from data, we propose a gradient-free algorithm to compute sparse architecture and network parameters directly from the given LTI system.
arXiv Detail & Related papers (2024-03-24T16:16:41Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation. We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience. Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z)
Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction. We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z)
An error-propagation spiking neural network compatible with neuromorphic processors [2.432141667343098]
We present a spike-based learning method that approximates back-propagation using local weight update mechanisms. We introduce a network architecture that enables synaptic weight update mechanisms to back-propagate error signals. This work represents a first step towards the design of ultra-low power mixed-signal neuromorphic processing systems.
arXiv Detail & Related papers (2021-04-12T07:21:08Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Action-Conditional Recurrent Kalman Networks For Forward and Inverse Dynamics Learning [17.80270555749689]
Estimating accurate forward and inverse dynamics models is a crucial component of model-based control for robots. We present two architectures for forward model learning and one for inverse model learning. Both architectures significantly outperform exist-ing model learning frameworks as well as analytical models in terms of prediction performance.
arXiv Detail & Related papers (2020-10-20T11:28:25Z)
Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies [15.2292571922932]
We propose a novel architecture for recurrent neural networks. Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations. Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks.
arXiv Detail & Related papers (2020-10-02T12:35:04Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
Industrial Forecasting with Exponentially Smoothed Recurrent Neural Networks [0.0]
We present a class of exponential smoothed recurrent neural networks (RNNs) which are well suited to modeling non-stationary dynamical systems arising in industrial applications. Application of exponentially smoothed RNNs to forecasting electricity load, weather data, and stock prices highlight the efficacy of exponential smoothing of the hidden state for multi-step time series forecasting.
arXiv Detail & Related papers (2020-04-09T17:53:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.