Parallelizing non-linear sequential models over the sequence length
- URL: http://arxiv.org/abs/2309.12252v3
- Date: Tue, 16 Jan 2024 16:56:11 GMT
- Title: Parallelizing non-linear sequential models over the sequence length
- Authors: Yi Heng Lim, Qi Zhu, Joshua Selfridge, Muhammad Firmansyah Kasim
- Abstract summary: We develop a parallel algorithm that accelerates GPU evaluation of sequential models by up to 3 orders of magnitude faster.
We discover the efficacy of the Gated Recurrent Unit in a long time series classification problem with 17k time samples.
- Score: 7.99707131886133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential models, such as Recurrent Neural Networks and Neural Ordinary
Differential Equations, have long suffered from slow training due to their
inherent sequential nature. For many years this bottleneck has persisted, as
many thought sequential models could not be parallelized. We challenge this
long-held belief with our parallel algorithm that accelerates GPU evaluation of
sequential models by up to 3 orders of magnitude faster without compromising
output accuracy. The algorithm does not need any special structure in the
sequential models' architecture, making it applicable to a wide range of
architectures. Using our method, training sequential models can be more than 10
times faster than the common sequential method without any meaningful
difference in the training results. Leveraging this accelerated training, we
discovered the efficacy of the Gated Recurrent Unit in a long time series
classification problem with 17k time samples. By overcoming the training
bottleneck, our work serves as the first step to unlock the potential of
non-linear sequential models for long sequence problems.
Related papers
- State Soup: In-Context Skill Learning, Retrieval and Mixing [22.485700977542127]
A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems.
Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter.
Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states as task vectors that can be stored, retrieved, and then linearly combined.
arXiv Detail & Related papers (2024-06-12T17:06:07Z) - LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences.
We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook.
LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Latent Neural ODEs with Sparse Bayesian Multiple Shooting [13.104556034767025]
Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice.
We propose a principled multiple shooting technique for neural ODEs that splits trajectories into manageable short segments, which are optimised in parallel.
We demonstrate efficient and stable training, and state-of-the-art performance on multiple large-scale benchmark datasets.
arXiv Detail & Related papers (2022-10-07T11:36:29Z) - Grasping Core Rules of Time Series through Pure Models [6.849905754473385]
PureTS is a network with three pure linear layers that achieved state-of-the-art in 80% of the long sequence prediction tasks.
We discuss the potential of pure linear layers in both phenomena and essence.
arXiv Detail & Related papers (2022-08-15T10:22:15Z) - Learning Sequence Representations by Non-local Recurrent Neural Memory [61.65105481899744]
We propose a Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning.
Our model is able to capture long-range dependencies and latent high-level features can be distilled by our model.
Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.
arXiv Detail & Related papers (2022-07-20T07:26:15Z) - Oscillatory Fourier Neural Network: A Compact and Efficient Architecture
for Sequential Processing [16.69710555668727]
We propose a novel neuron model that has cosine activation with a time varying component for sequential processing.
The proposed neuron provides an efficient building block for projecting sequential inputs into spectral domain.
Applying the proposed model to sentiment analysis on IMDB dataset reaches 89.4% test accuracy within 5 epochs.
arXiv Detail & Related papers (2021-09-14T19:08:07Z) - Learning from Irregularly-Sampled Time Series: A Missing Data
Perspective [18.493394650508044]
Irregularly-sampled time series occur in many domains including healthcare.
We model irregularly-sampled time series data as a sequence of index-value pairs sampled from a continuous but unobserved function.
We propose learning methods for this framework based on variational autoencoders and generative adversarial networks.
arXiv Detail & Related papers (2020-08-17T20:01:55Z) - STEER: Simple Temporal Regularization For Neural ODEs [80.80350769936383]
We propose a new regularization technique: randomly sampling the end time of the ODE during training.
The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks.
We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.
arXiv Detail & Related papers (2020-06-18T17:44:50Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.