The impact of memory on learning sequence-to-sequence tasks
- URL: http://arxiv.org/abs/2205.14683v2
- Date: Thu, 14 Dec 2023 15:42:52 GMT
- Title: The impact of memory on learning sequence-to-sequence tasks
- Authors: Alireza Seif, Sarah A.M. Loos, Gennaro Tucci, \'Edgar Rold\'an,
Sebastian Goldt
- Abstract summary: Recent success of neural networks in natural language processing has drawn renewed attention to learning sequence-to-sequence (seq2seq) tasks.
We propose a model for a seq2seq task that has the advantage of providing explicit control over the degree of memory, or non-Markovianity, in the sequences.
- Score: 6.603326895384289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent success of neural networks in natural language processing has
drawn renewed attention to learning sequence-to-sequence (seq2seq) tasks. While
there exists a rich literature that studies classification and regression tasks
using solvable models of neural networks, seq2seq tasks have not yet been
studied from this perspective. Here, we propose a simple model for a seq2seq
task that has the advantage of providing explicit control over the degree of
memory, or non-Markovianity, in the sequences -- the stochastic
switching-Ornstein-Uhlenbeck (SSOU) model. We introduce a measure of
non-Markovianity to quantify the amount of memory in the sequences. For a
minimal auto-regressive (AR) learning model trained on this task, we identify
two learning regimes corresponding to distinct phases in the stationary state
of the SSOU process. These phases emerge from the interplay between two
different time scales that govern the sequence statistics. Moreover, we observe
that while increasing the integration window of the AR model always improves
performance, albeit with diminishing returns, increasing the non-Markovianity
of the input sequences can improve or degrade its performance. Finally, we
perform experiments with recurrent and convolutional neural networks that show
that our observations carry over to more complicated neural network
architectures.
Related papers
- State Soup: In-Context Skill Learning, Retrieval and Mixing [22.485700977542127]
A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems.
Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter.
Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states as task vectors that can be stored, retrieved, and then linearly combined.
arXiv Detail & Related papers (2024-06-12T17:06:07Z) - P-SpikeSSM: Harnessing Probabilistic Spiking State Space Models for Long-Range Dependency Tasks [1.9775291915550175]
Spiking neural networks (SNNs) are posited as a computationally efficient and biologically plausible alternative to conventional neural architectures.
We develop a scalable probabilistic spiking learning framework for long-range dependency tasks.
Our models attain state-of-the-art performance among SNN models across diverse long-range dependency tasks.
arXiv Detail & Related papers (2024-06-05T04:23:11Z) - Time Elastic Neural Networks [2.1756081703276]
We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN)
The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability.
We demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell.
arXiv Detail & Related papers (2024-05-27T09:01:30Z) - Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks.
Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs.
We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z) - Learning Sequence Representations by Non-local Recurrent Neural Memory [61.65105481899744]
We propose a Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning.
Our model is able to capture long-range dependencies and latent high-level features can be distilled by our model.
Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.
arXiv Detail & Related papers (2022-07-20T07:26:15Z) - Oscillatory Fourier Neural Network: A Compact and Efficient Architecture
for Sequential Processing [16.69710555668727]
We propose a novel neuron model that has cosine activation with a time varying component for sequential processing.
The proposed neuron provides an efficient building block for projecting sequential inputs into spectral domain.
Applying the proposed model to sentiment analysis on IMDB dataset reaches 89.4% test accuracy within 5 epochs.
arXiv Detail & Related papers (2021-09-14T19:08:07Z) - Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z) - Gradient Projection Memory for Continual Learning [5.43185002439223]
The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems.
We propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks.
arXiv Detail & Related papers (2021-03-17T16:31:29Z) - Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting [54.03356526990088]
We propose Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective.
SSR provides more fine-grained learning signals for text representations by supervising the model to rewrite imperfect spans to ground truth.
Our experiments with T5 models on various seq2seq tasks show that SSR can substantially improve seq2seq pre-training.
arXiv Detail & Related papers (2021-01-02T10:27:11Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.