Convolutional Tensor-Train LSTM for Spatio-temporal Learning
- URL: http://arxiv.org/abs/2002.09131v5
- Date: Sun, 4 Oct 2020 23:14:31 GMT
- Title: Convolutional Tensor-Train LSTM for Spatio-temporal Learning
- Authors: Jiahao Su, Wonmin Byeon, Jean Kossaifi, Furong Huang, Jan Kautz,
Animashree Anandkumar
- Abstract summary: We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
- Score: 116.24172387469994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from spatio-temporal data has numerous applications such as
human-behavior analysis, object tracking, video compression, and physics
simulation.However, existing methods still perform poorly on challenging video
tasks such as long-term forecasting. This is because these kinds of challenging
tasks require learning long-term spatio-temporal correlations in the video
sequence. In this paper, we propose a higher-order convolutional LSTM model
that can efficiently learn these correlations, along with a succinct
representations of the history. This is accomplished through a novel tensor
train module that performs prediction by combining convolutional features
across time. To make this feasible in terms of computation and memory
requirements, we propose a novel convolutional tensor-train decomposition of
the higher-order model. This decomposition reduces the model complexity by
jointly approximating a sequence of convolutional kernels asa low-rank
tensor-train factorization. As a result, our model outperforms existing
approaches, but uses only a fraction of parameters, including the baseline
models.Our results achieve state-of-the-art performance in a wide range of
applications and datasets, including the multi-steps video prediction on the
Moving-MNIST-2and KTH action datasets as well as early activity recognition on
the Something-Something V2 dataset.
Related papers
- TSLANet: Rethinking Transformers for Time Series Representation Learning [19.795353886621715]
Time series data is characterized by its intrinsic long and short-range dependencies.
We introduce a novel Time Series Lightweight Network (TSLANet) as a universal convolutional model for diverse time series tasks.
Our experiments demonstrate that TSLANet outperforms state-of-the-art models in various tasks spanning classification, forecasting, and anomaly detection.
arXiv Detail & Related papers (2024-04-12T13:41:29Z) - Neural Dynamical Operator: Continuous Spatial-Temporal Model with Gradient-Based and Derivative-Free Optimization Methods [0.0]
We present a data-driven modeling framework called neural dynamical operator that is continuous in both space and time.
A key feature of the neural dynamical operator is the resolution-invariance with respect to both spatial and temporal discretizations.
We show that the proposed model can better predict long-term statistics via the hybrid optimization scheme.
arXiv Detail & Related papers (2023-11-20T14:31:18Z) - Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual
Downstream Tasks [55.36987468073152]
This paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism.
The DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders.
Our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA.
arXiv Detail & Related papers (2023-11-09T05:24:20Z) - TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS)
We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Contextually Enhanced ES-dRNN with Dynamic Attention for Short-Term Load
Forecasting [1.1602089225841632]
The proposed model is composed of two simultaneously trained tracks: the context track and the main track.
The RNN architecture consists of multiple recurrent layers stacked with hierarchical dilations and equipped with recently proposed attentive recurrent cells.
The model produces both point forecasts and predictive intervals.
arXiv Detail & Related papers (2022-12-18T07:42:48Z) - Deep Generative model with Hierarchical Latent Factors for Time Series
Anomaly Detection [40.21502451136054]
This work presents DGHL, a new family of generative models for time series anomaly detection.
A top-down Convolution Network maps a novel hierarchical latent space to time series windows, exploiting temporal dynamics to encode information efficiently.
Our method outperformed current state-of-the-art models on four popular benchmark datasets.
arXiv Detail & Related papers (2022-02-15T17:19:44Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Sparse and Low-Rank High-Order Tensor Regression via Parallel Proximal
Method [6.381138694845438]
We propose the Sparse and Low-rank Regression model for large-scale data with high-order structures.
Our model enforces sparsity and low-rankness of the tensor coefficient.
Our model's predictions exhibit meaningful interpretations on the video dataset.
arXiv Detail & Related papers (2019-11-29T06:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.