TCTN: A 3D-Temporal Convolutional Transformer Network for Spatiotemporal
Predictive Learning
- URL: http://arxiv.org/abs/2112.01085v1
- Date: Thu, 2 Dec 2021 10:05:01 GMT
- Title: TCTN: A 3D-Temporal Convolutional Transformer Network for Spatiotemporal
Predictive Learning
- Authors: Ziao Yang, Xiangrui Yang and Qifeng Lin
- Abstract summary: We propose an algorithm named 3D-temporal convolutional transformer (TCTN), where a transformer-based encoder with temporal convolutional layers is employed to capture short-term and long-term dependencies.
Our proposed algorithm can be easy to implement and trained much faster compared with RNN-based methods thanks to the parallel mechanism of Transformer.
- Score: 1.952097552284465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatiotemporal predictive learning is to generate future frames given a
sequence of historical frames. Conventional algorithms are mostly based on
recurrent neural networks (RNNs). However, RNN suffers from heavy computational
burden such as time and long back-propagation process due to the seriality of
recurrent structure. Recently, Transformer-based methods have also been
investigated in the form of encoder-decoder or plain encoder, but the
encoder-decoder form requires too deep networks and the plain encoder is lack
of short-term dependencies. To tackle these problems, we propose an algorithm
named 3D-temporal convolutional transformer (TCTN), where a transformer-based
encoder with temporal convolutional layers is employed to capture short-term
and long-term dependencies. Our proposed algorithm can be easy to implement and
trained much faster compared with RNN-based methods thanks to the parallel
mechanism of Transformer. To validate our algorithm, we conduct experiments on
the MovingMNIST and KTH dataset, and show that TCTN outperforms
state-of-the-art (SOTA) methods in both performance and training speed.
Related papers
- Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Transformer-based Video Saliency Prediction with High Temporal Dimension
Decoding [12.595019348741042]
We propose a transformer-based video saliency prediction approach with high temporal dimension network decoding (THTDNet)
This architecture yields comparable performance to multi-branch and over-complicated models on common benchmarks such as DHF1K, UCF-sports and Hollywood-2.
arXiv Detail & Related papers (2024-01-15T20:09:56Z) - Deep-Unfolding for Next-Generation Transceivers [49.338084953253755]
The stringent performance requirements of future wireless networks are spurring studies on defining the next-generation multiple-input multiple-output (MIMO) transceivers.
For the design of advanced transceivers in wireless communications, optimization approaches often leading to iterative algorithms have achieved great success.
Deep learning, approximating the iterative algorithms with deep neural networks (DNNs) can significantly reduce the computational time.
Deep-unfolding has emerged which incorporates the benefits of both deep learning and iterative algorithms, by unfolding the iterative algorithm into a layer-wise structure.
arXiv Detail & Related papers (2023-05-15T02:13:41Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid
Precoding [94.40747235081466]
We propose an end-to-end deep learning-based joint transceiver design algorithm for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems.
We develop a DNN architecture that maps the received pilots into feedback bits at the receiver, and then further maps the feedback bits into the hybrid precoder at the transmitter.
arXiv Detail & Related papers (2021-10-22T20:49:02Z) - Spike-inspired Rank Coding for Fast and Accurate Recurrent Neural
Networks [5.986408771459261]
Biological spiking neural networks (SNNs) can temporally encode information in their outputs, whereas artificial neural networks (ANNs) conventionally do not.
Here we show that temporal coding such as rank coding (RC) inspired by SNNs can also be applied to conventional ANNs such as LSTMs.
RC-training also significantly reduces time-to-insight during inference, with a minimal decrease in accuracy.
We demonstrate these in two toy problems of sequence classification, and in a temporally-encoded MNIST dataset where our RC model achieves 99.19% accuracy after the first input time-step
arXiv Detail & Related papers (2021-10-06T15:51:38Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.