Related papers: On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era

On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era

URL: http://arxiv.org/abs/2402.08132v2
Date: Wed, 14 Feb 2024 13:04:28 GMT
Title: On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era
Authors: Matteo Tiezzi, Michele Casoni, Alessandro Betti, Tommaso Guidi, Marco Gori and Stefano Melacci
Abstract summary: This survey is aimed at providing an overview of these trends framed under the unifying umbrella of Recurrence. It emphasizes novel research opportunities that become prominent when abandoning the idea of processing long sequences.
Score: 59.279784235147254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A longstanding challenge for the Machine Learning community is the one of developing models that are capable of processing and learning from very long sequences of data. The outstanding results of Transformers-based networks (e.g., Large Language Models) promotes the idea of parallel attention as the key to succeed in such a challenge, obfuscating the role of classic sequential processing of Recurrent Models. However, in the last few years, researchers who were concerned by the quadratic complexity of self-attention have been proposing a novel wave of neural models, which gets the best from the two worlds, i.e., Transformers and Recurrent Nets. Meanwhile, Deep Space-State Models emerged as robust approaches to function approximation over time, thus opening a new perspective in learning from sequential data, followed by many people in the field and exploited to implement a special class of (linear) Recurrent Neural Networks. This survey is aimed at providing an overview of these trends framed under the unifying umbrella of Recurrence. Moreover, it emphasizes novel research opportunities that become prominent when abandoning the idea of processing long sequences whose length is known-in-advance for the more realistic setting of potentially infinite-length sequences, thus intersecting the field of lifelong-online learning from streamed data.

Related papers

Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance. Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws. We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z)
Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations [52.11801730860999]
In recent years, the robot learning community has shown increasing interest in using deep generative models to capture the complexity of large datasets. We present the different types of models that the community has explored, such as energy-based models, diffusion models, action value maps, or generative adversarial networks. We also present the different types of applications in which deep generative models have been used, from grasp generation to trajectory generation or cost learning.
arXiv Detail & Related papers (2024-08-08T11:34:31Z)
State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era [59.279784235147254]
This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing. The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time.
arXiv Detail & Related papers (2024-06-13T12:51:22Z)
On the Challenges and Opportunities in Generative AI [135.2754367149689]
We argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains. In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability.
arXiv Detail & Related papers (2024-02-28T15:19:33Z)
Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning [7.967995669387532]
The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks. We investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting. We propose a set of alternative models that are better performing and significantly less complex.
arXiv Detail & Related papers (2023-04-10T12:47:42Z)
Online Evolutionary Neural Architecture Search for Multivariate Non-Stationary Time Series Forecasting [72.89994745876086]
This work presents the Online Neuro-Evolution-based Neural Architecture Search (ONE-NAS) algorithm. ONE-NAS is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks. Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods.
arXiv Detail & Related papers (2023-02-20T22:25:47Z)
Continual Learning of Long Topic Sequences in Neural Information Retrieval [2.3846478553599098]
We first propose a dataset based upon the MSMarco corpus aiming at modeling a long stream of topics. We then in-depth analyze the ability of recent neural IR models while continually learning those streams.
arXiv Detail & Related papers (2022-01-10T14:19:09Z)
Stochastic Recurrent Neural Network for Multistep Time Series Forecasting [0.0]
We leverage advances in deep generative models and the concept of state space models to propose an adaptation of the recurrent neural network for time series forecasting. Our model preserves the architectural workings of a recurrent neural network for which all relevant information is encapsulated in its hidden states, and this flexibility allows our model to be easily integrated into any deep architecture for sequential modelling.
arXiv Detail & Related papers (2021-04-26T01:43:43Z)
Factorized Deep Generative Models for Trajectory Generation with Spatiotemporal-Validity Constraints [10.960924101404498]
Deep generative models for trajectory data can learn expressively explanatory models for sophisticated latent patterns. We first propose novel deep generative models factorizing time-variant and time-invariant latent variables. We then develop new inference strategies based on variational inference and constrained optimization to thetemporal validity.
arXiv Detail & Related papers (2020-09-20T02:06:36Z)
Forecasting Sequential Data using Consistent Koopman Autoencoders [52.209416711500005]
A new class of physics-based methods related to Koopman theory has been introduced, offering an alternative for processing nonlinear dynamical systems. We propose a novel Consistent Koopman Autoencoder model which, unlike the majority of existing work, leverages the forward and backward dynamics. Key to our approach is a new analysis which explores the interplay between consistent dynamics and their associated Koopman operators.
arXiv Detail & Related papers (2020-03-04T18:24:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.