Temporal Interlacing Network
- URL: http://arxiv.org/abs/2001.06499v1
- Date: Fri, 17 Jan 2020 19:06:05 GMT
- Title: Temporal Interlacing Network
- Authors: Hao Shao, Shengju Qian, Yu Liu
- Abstract summary: temporal interlacing network (TIN) is a simple yet powerful operator for learning temporal features.
TIN fuses the two kinds of information by interlacing spatial representations from the past to the future.
TIN wins the $1st$ in the ICCV19 - Multi Moments in Time challenge.
- Score: 8.876132549551738
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For a long time, the vision community tries to learn the spatio-temporal
representation by combining convolutional neural network together with various
temporal models, such as the families of Markov chain, optical flow, RNN and
temporal convolution. However, these pipelines consume enormous computing
resources due to the alternately learning process for spatial and temporal
information. One natural question is whether we can embed the temporal
information into the spatial one so the information in the two domains can be
jointly learned once-only. In this work, we answer this question by presenting
a simple yet powerful operator -- temporal interlacing network (TIN). Instead
of learning the temporal features, TIN fuses the two kinds of information by
interlacing spatial representations from the past to the future, and vice
versa. A differentiable interlacing target can be learned to control the
interlacing process. In this way, a heavy temporal model is replaced by a
simple interlacing operator. We theoretically prove that with a learnable
interlacing target, TIN performs equivalently to the regularized temporal
convolution network (r-TCN), but gains 4% more accuracy with 6x less latency on
6 challenging benchmarks. These results push the state-of-the-art performances
of video understanding by a considerable margin. Not surprising, the ensemble
model of the proposed TIN won the $1^{st}$ place in the ICCV19 - Multi Moments
in Time challenge. Code is made available to facilitate further research at
https://github.com/deepcs233/TIN
Related papers
- TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Intensity Profile Projection: A Framework for Continuous-Time
Representation Learning for Dynamic Networks [50.2033914945157]
We present a representation learning framework, Intensity Profile Projection, for continuous-time dynamic network data.
The framework consists of three stages: estimating pairwise intensity functions, learning a projection which minimises a notion of intensity reconstruction error.
Moreoever, we develop estimation theory providing tight control on the error of any estimated trajectory, indicating that the representations could even be used in quite noise-sensitive follow-on analyses.
arXiv Detail & Related papers (2023-06-09T15:38:25Z) - Temporal Aggregation and Propagation Graph Neural Networks for Dynamic
Representation [67.26422477327179]
Temporal graphs exhibit dynamic interactions between nodes over continuous time.
We propose a novel method of temporal graph convolution with the whole neighborhood.
Our proposed TAP-GNN outperforms existing temporal graph methods by a large margin in terms of both predictive performance and online inference latency.
arXiv Detail & Related papers (2023-04-15T08:17:18Z) - FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial
Video Classification [49.06447472006251]
We propose a novel deep neural network, termed FuTH-Net, to model not only holistic features, but also temporal relations for aerial video classification.
Our model is evaluated on two aerial video classification datasets, ERA and Drone-Action, and achieves the state-of-the-art results.
arXiv Detail & Related papers (2022-09-22T21:15:58Z) - HyperTime: Implicit Neural Representation for Time Series [131.57172578210256]
Implicit neural representations (INRs) have recently emerged as a powerful tool that provides an accurate and resolution-independent encoding of data.
In this paper, we analyze the representation of time series using INRs, comparing different activation functions in terms of reconstruction accuracy and training convergence speed.
We propose a hypernetwork architecture that leverages INRs to learn a compressed latent representation of an entire time series dataset.
arXiv Detail & Related papers (2022-08-11T14:05:51Z) - Multi-scale temporal network for continuous sign language recognition [10.920363368754721]
Continuous Sign Language Recognition is a challenging research task due to the lack of accurate annotation on the temporal sequence of sign language data.
This paper proposes a multi-scale temporal network (MSTNet) to extract more accurate temporal features.
Experimental results on two publicly available datasets demonstrate that our method can effectively extract sign language features in an end-to-end manner without any prior knowledge.
arXiv Detail & Related papers (2022-04-08T06:14:22Z) - SITHCon: A neural network robust to variations in input scaling on the
time dimension [0.0]
In machine learning, convolutional neural networks (CNNs) have been extremely influential in both computer vision and in recognizing patterns extended over time.
This paper introduces a Scale-Invariant Temporal History Convolution network (SITHCon) that uses a logarithmically-distributed temporal memory.
arXiv Detail & Related papers (2021-07-09T18:11:50Z) - Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video
Super-Resolution [4.9136996406481135]
Video super-resolution (VSR) aims to estimate a high-resolution (HR) frame from a low-resolution (LR) frames.
Key challenge for VSR lies in the effective exploitation of spatial correlation in an intra-frame and temporal dependency between consecutive frames.
arXiv Detail & Related papers (2021-06-14T06:36:13Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - Multivariate Time Series Classification Using Spiking Neural Networks [7.273181759304122]
Spiking neural network has drawn attention as it enables low power consumption.
We present an encoding scheme to convert time series into sparse spatial temporal spike patterns.
A training algorithm to classify spatial temporal patterns is also proposed.
arXiv Detail & Related papers (2020-07-07T15:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.