ChordMixer: A Scalable Neural Attention Model for Sequences with
Different Lengths
- URL: http://arxiv.org/abs/2206.05852v2
- Date: Fri, 5 May 2023 21:58:41 GMT
- Title: ChordMixer: A Scalable Neural Attention Model for Sequences with
Different Lengths
- Authors: Ruslan Khalitov, Tong Yu, Lei Cheng, Zhirong Yang
- Abstract summary: We propose a simple neural network building block called ChordMixer which can model the attention for long sequences with variable lengths.
Repeatedly applying such blocks forms an effective network backbone that mixes the input signals towards the learning targets.
- Score: 9.205331586765613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential data naturally have different lengths in many domains, with some
very long sequences. As an important modeling tool, neural attention should
capture long-range interaction in such sequences. However, most existing neural
attention models admit only short sequences, or they have to employ chunking or
padding to enforce a constant input length. Here we propose a simple neural
network building block called ChordMixer which can model the attention for long
sequences with variable lengths. Each ChordMixer block consists of a
position-wise rotation layer without learnable parameters and an element-wise
MLP layer. Repeatedly applying such blocks forms an effective network backbone
that mixes the input signals towards the learning targets. We have tested
ChordMixer on the synthetic adding problem, long document classification, and
DNA sequence-based taxonomy classification. The experiment results show that
our method substantially outperforms other neural attention models.
Related papers
- MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron
Captioning [1.7243216387069678]
Neuron labeling is an approach to visualize the behaviour and respond of a certain neuron to a certain pattern that activates the neuron.
Previous work, namely MILAN, has tried to visualize the neuron behaviour using modified Show, Attend, and Tell (SAT) model in the encoder, and LSTM added with Bahdanau attention in the decoder.
In this work, we would like to improve the performance of MILAN even more by utilizing different kind of attention mechanism and additionally adding several attention result into one.
arXiv Detail & Related papers (2024-01-05T10:41:55Z) - Multiscale Residual Learning of Graph Convolutional Sequence Chunks for
Human Motion Prediction [23.212848643552395]
A new method is proposed for human motion prediction by learning temporal and spatial dependencies.
Our proposed method is able to effectively model the sequence information for motion prediction and outperform other techniques to set a new state-of-the-art.
arXiv Detail & Related papers (2023-08-31T15:23:33Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Toeplitz Neural Network for Sequence Modeling [46.04964190407727]
We show that a Toeplitz matrix-vector production trick can reduce the space-time complexity of the sequence modeling to log linear.
A lightweight sub-network called relative position encoder is proposed to generate relative position coefficients with a fixed budget of parameters.
Despite being trained on 512-token sequences, our model can extrapolate input sequence length up to 14K tokens in inference with consistent performance.
arXiv Detail & Related papers (2023-05-08T14:49:01Z) - Learning Sequence Representations by Non-local Recurrent Neural Memory [61.65105481899744]
We propose a Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning.
Our model is able to capture long-range dependencies and latent high-level features can be distilled by our model.
Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.
arXiv Detail & Related papers (2022-07-20T07:26:15Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - Point process models for sequence detection in high-dimensional neural
spike trains [29.073129195368235]
We develop a point process model that characterizes fine-scale sequences at the level of individual spikes.
This ultra-sparse representation of sequence events opens new possibilities for spike train modeling.
arXiv Detail & Related papers (2020-10-10T02:21:44Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z) - Hard Non-Monotonic Attention for Character-Level Transduction [65.17388794270694]
We introduce an exact, exponential-time algorithm for marginalizing over a number of non-monotonic alignments between two strings.
We compare soft and hard non-monotonic attention experimentally and find that the exact algorithm significantly improves performance over the approximation and outperforms soft attention.
arXiv Detail & Related papers (2018-08-29T20:00:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.