On the Resurgence of Recurrent Models for Long Sequences -- Survey and
Research Opportunities in the Transformer Era
- URL: http://arxiv.org/abs/2402.08132v2
- Date: Wed, 14 Feb 2024 13:04:28 GMT
- Title: On the Resurgence of Recurrent Models for Long Sequences -- Survey and
Research Opportunities in the Transformer Era
- Authors: Matteo Tiezzi, Michele Casoni, Alessandro Betti, Tommaso Guidi, Marco
Gori and Stefano Melacci
- Abstract summary: This survey is aimed at providing an overview of these trends framed under the unifying umbrella of Recurrence.
It emphasizes novel research opportunities that become prominent when abandoning the idea of processing long sequences.
- Score: 59.279784235147254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A longstanding challenge for the Machine Learning community is the one of
developing models that are capable of processing and learning from very long
sequences of data. The outstanding results of Transformers-based networks
(e.g., Large Language Models) promotes the idea of parallel attention as the
key to succeed in such a challenge, obfuscating the role of classic sequential
processing of Recurrent Models. However, in the last few years, researchers who
were concerned by the quadratic complexity of self-attention have been
proposing a novel wave of neural models, which gets the best from the two
worlds, i.e., Transformers and Recurrent Nets. Meanwhile, Deep Space-State
Models emerged as robust approaches to function approximation over time, thus
opening a new perspective in learning from sequential data, followed by many
people in the field and exploited to implement a special class of (linear)
Recurrent Neural Networks. This survey is aimed at providing an overview of
these trends framed under the unifying umbrella of Recurrence. Moreover, it
emphasizes novel research opportunities that become prominent when abandoning
the idea of processing long sequences whose length is known-in-advance for the
more realistic setting of potentially infinite-length sequences, thus
intersecting the field of lifelong-online learning from streamed data.
Related papers
- Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations [52.11801730860999]
In recent years, the robot learning community has shown increasing interest in using deep generative models to capture the complexity of large datasets.
We present the different types of models that the community has explored, such as energy-based models, diffusion models, action value maps, or generative adversarial networks.
We also present the different types of applications in which deep generative models have been used, from grasp generation to trajectory generation or cost learning.
arXiv Detail & Related papers (2024-08-08T11:34:31Z) - State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era [59.279784235147254]
This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing.
The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time.
arXiv Detail & Related papers (2024-06-13T12:51:22Z) - On the Challenges and Opportunities in Generative AI [135.2754367149689]
We argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains.
In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability.
arXiv Detail & Related papers (2024-02-28T15:19:33Z) - Two Steps Forward and One Behind: Rethinking Time Series Forecasting
with Deep Learning [7.967995669387532]
The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks.
We investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting.
We propose a set of alternative models that are better performing and significantly less complex.
arXiv Detail & Related papers (2023-04-10T12:47:42Z) - Online Evolutionary Neural Architecture Search for Multivariate
Non-Stationary Time Series Forecasting [72.89994745876086]
This work presents the Online Neuro-Evolution-based Neural Architecture Search (ONE-NAS) algorithm.
ONE-NAS is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks.
Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods.
arXiv Detail & Related papers (2023-02-20T22:25:47Z) - Continual Learning of Long Topic Sequences in Neural Information
Retrieval [2.3846478553599098]
We first propose a dataset based upon the MSMarco corpus aiming at modeling a long stream of topics.
We then in-depth analyze the ability of recent neural IR models while continually learning those streams.
arXiv Detail & Related papers (2022-01-10T14:19:09Z) - Stochastic Recurrent Neural Network for Multistep Time Series
Forecasting [0.0]
We leverage advances in deep generative models and the concept of state space models to propose an adaptation of the recurrent neural network for time series forecasting.
Our model preserves the architectural workings of a recurrent neural network for which all relevant information is encapsulated in its hidden states, and this flexibility allows our model to be easily integrated into any deep architecture for sequential modelling.
arXiv Detail & Related papers (2021-04-26T01:43:43Z) - Factorized Deep Generative Models for Trajectory Generation with
Spatiotemporal-Validity Constraints [10.960924101404498]
Deep generative models for trajectory data can learn expressively explanatory models for sophisticated latent patterns.
We first propose novel deep generative models factorizing time-variant and time-invariant latent variables.
We then develop new inference strategies based on variational inference and constrained optimization to thetemporal validity.
arXiv Detail & Related papers (2020-09-20T02:06:36Z) - Forecasting Sequential Data using Consistent Koopman Autoencoders [52.209416711500005]
A new class of physics-based methods related to Koopman theory has been introduced, offering an alternative for processing nonlinear dynamical systems.
We propose a novel Consistent Koopman Autoencoder model which, unlike the majority of existing work, leverages the forward and backward dynamics.
Key to our approach is a new analysis which explores the interplay between consistent dynamics and their associated Koopman operators.
arXiv Detail & Related papers (2020-03-04T18:24:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.