Related papers: State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era

State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era

URL: http://arxiv.org/abs/2406.09062v1
Date: Thu, 13 Jun 2024 12:51:22 GMT
Title: State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era
Authors: Matteo Tiezzi, Michele Casoni, Alessandro Betti, Marco Gori, Stefano Melacci,
Abstract summary: This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing. The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time.
Score: 59.279784235147254
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Effectively learning from sequential data is a longstanding goal of Artificial Intelligence, especially in the case of long sequences. From the dawn of Machine Learning, several researchers engaged in the search of algorithms and architectures capable of processing sequences of patterns, retaining information about the past inputs while still leveraging the upcoming data, without losing precious long-term dependencies and correlations. While such an ultimate goal is inspired by the human hallmark of continuous real-time processing of sensory information, several solutions simplified the learning paradigm by artificially limiting the processed context or dealing with sequences of limited length, given in advance. These solutions were further emphasized by the large ubiquity of Transformers, that have initially shaded the role of Recurrent Neural Nets. However, recurrent networks are facing a strong recent revival due to the growing popularity of (deep) State-Space models and novel instances of large-context Transformers, which are both based on recurrent computations to go beyond several limits of currently ubiquitous technologies. In fact, the fast development of Large Language Models enhanced the interest in efficient solutions to process data over time. This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing. A complete taxonomy over the latest trends in architectural and algorithmic solutions is reported and discussed, guiding researchers in this appealing research field. The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time, towards a more realistic scenario where patterns are effectively processed online, leveraging local-forward computations, opening to further research on this topic.

Related papers

Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance. Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws. We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z)
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era [59.279784235147254]
This survey is aimed at providing an overview of these trends framed under the unifying umbrella of Recurrence. It emphasizes novel research opportunities that become prominent when abandoning the idea of processing long sequences.
arXiv Detail & Related papers (2024-02-12T23:55:55Z)
Deep-Unfolding for Next-Generation Transceivers [49.338084953253755]
The stringent performance requirements of future wireless networks are spurring studies on defining the next-generation multiple-input multiple-output (MIMO) transceivers. For the design of advanced transceivers in wireless communications, optimization approaches often leading to iterative algorithms have achieved great success. Deep learning, approximating the iterative algorithms with deep neural networks (DNNs) can significantly reduce the computational time. Deep-unfolding has emerged which incorporates the benefits of both deep learning and iterative algorithms, by unfolding the iterative algorithm into a layer-wise structure.
arXiv Detail & Related papers (2023-05-15T02:13:41Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
Data Augmentation techniques in time series domain: A survey and taxonomy [0.20971479389679332]
Deep neural networks used to work with time series heavily depend on the size and consistency of the datasets used in training. This work systematically reviews the current state-of-the-art in the area to provide an overview of all available algorithms. The ultimate aim of this study is to provide a summary of the evolution and performance of areas that produce better results to guide future researchers in this field.
arXiv Detail & Related papers (2022-06-25T17:09:00Z)
Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection [40.21502451136054]
This work presents DGHL, a new family of generative models for time series anomaly detection. A top-down Convolution Network maps a novel hierarchical latent space to time series windows, exploiting temporal dynamics to encode information efficiently. Our method outperformed current state-of-the-art models on four popular benchmark datasets.
arXiv Detail & Related papers (2022-02-15T17:19:44Z)
Continual Learning of Long Topic Sequences in Neural Information Retrieval [2.3846478553599098]
We first propose a dataset based upon the MSMarco corpus aiming at modeling a long stream of topics. We then in-depth analyze the ability of recent neural IR models while continually learning those streams.
arXiv Detail & Related papers (2022-01-10T14:19:09Z)
Convolutional generative adversarial imputation networks for spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods. We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z)
A Practical Survey on Faster and Lighter Transformers [0.9176056742068811]
The Transformer is a model solely based on the attention mechanism that is able to relate any two positions of the input sequence. It has improved the state-of-the-art across numerous sequence modelling tasks. However, its effectiveness comes at the expense of a quadratic computational and memory complexity with respect to the sequence length.
arXiv Detail & Related papers (2021-03-26T17:54:47Z)
Deep Cellular Recurrent Network for Efficient Analysis of Time-Series Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information. The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.