Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
- URL: http://arxiv.org/abs/2405.17383v1
- Date: Mon, 27 May 2024 17:38:55 GMT
- Title: Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
- Authors: Zhen Qin, Xuyang Shen, Weigao Sun, Dong Li, Stan Birchfield, Richard Hartley, Yiran Zhong,
- Abstract summary: The Linear Complexity Sequence Model (LCSM) unites various sequence modeling techniques with linear complexity.
We segment the modeling processes of these models into three distinct stages: Expand, Oscillation, and Shrink.
We perform experiments to analyze the impact of different stage settings on language modeling and retrieval tasks.
- Score: 26.479602180023125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the Linear Complexity Sequence Model (LCSM), a comprehensive solution that unites various sequence modeling techniques with linear complexity, including linear attention, state space model, long convolution, and linear RNN, within a single framework. The goal is to enhance comprehension of these models by analyzing the impact of each component from a cohesive and streamlined viewpoint. Specifically, we segment the modeling processes of these models into three distinct stages: Expand, Oscillation, and Shrink (EOS), with each model having its own specific settings. The Expand stage involves projecting the input signal onto a high-dimensional memory state. This is followed by recursive operations performed on the memory state in the Oscillation stage. Finally, the memory state is projected back to a low-dimensional space in the Shrink stage. We perform comprehensive experiments to analyze the impact of different stage settings on language modeling and retrieval tasks. Our results show that data-driven methods are crucial for the effectiveness of the three stages in language modeling, whereas hand-crafted methods yield better performance in retrieval tasks.
Related papers
- Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods [8.654571696634825]
State Space Models (SSM) and Koopman-based deep learning methods for modelling the dynamics of both linear and non-linear stiff strings.
Our findings indicate that our proposed Koopman-based model performs as well as or better than other existing approaches in non-linear cases for long-sequence modelling.
This research contributes insights into the physical modelling of dynamical systems by offering a comparative overview of these and previous methods and introducing innovative strategies for model improvement.
arXiv Detail & Related papers (2024-08-29T15:55:27Z) - Three Mechanisms of Feature Learning in the Exact Solution of a Latent Variable Model [0.34530027457862006]
We identify and exactly solve the learning dynamics of a one-hidden-layer linear model at any finite width.
Our solution identifies three novel prototype mechanisms of feature learning: (1) learning by alignment, (2) learning by disalignment, and (3) learning by rescaling.
In sharp contrast, none of these mechanisms is present in the kernel regime of the model.
arXiv Detail & Related papers (2024-01-13T14:21:46Z) - Language-free Compositional Action Generation via Decoupling Refinement [67.50452446686725]
We introduce a novel framework to generate compositional actions without reliance on language auxiliaries.
Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement.
arXiv Detail & Related papers (2023-07-07T12:00:38Z) - Diffusion Action Segmentation [63.061058214427085]
We propose a novel framework via denoising diffusion models, which shares the same inherent spirit of such iterative refinement.
In this framework, action predictions are iteratively generated from random noise with input video features as conditions.
arXiv Detail & Related papers (2023-03-31T10:53:24Z) - An end-to-end multi-scale network for action prediction in videos [31.967024536359908]
We develop an efficient multi-scale network to predict action classes in partial videos in an end-to-end manner.
Our E2EMSNet is evaluated on three challenging datasets: BIT, HMDB51, and UCF101.
arXiv Detail & Related papers (2022-12-31T06:58:41Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Dynamical Deep Generative Latent Modeling of 3D Skeletal Motion [15.359134407309726]
Our model decomposes highly correlated skeleton data into a set of few spatial basis of switching temporal processes.
This results in a dynamical deep generative latent model that parses the meaningful intrinsic states in the dynamics of 3D pose data.
arXiv Detail & Related papers (2021-06-18T23:58:49Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.