TDN: Temporal Difference Networks for Efficient Action Recognition
- URL: http://arxiv.org/abs/2012.10071v2
- Date: Thu, 1 Apr 2021 01:51:10 GMT
- Title: TDN: Temporal Difference Networks for Efficient Action Recognition
- Authors: Limin Wang, Zhan Tong, Bin Ji, Gangshan Wu
- Abstract summary: This paper presents a new video architecture, termed as Temporal Difference Network (TDN)
The core of our TDN is to devise an efficient temporal module (TDM) by explicitly leveraging a temporal difference operator.
Our TDN presents a new state of the art on the Something-Something V1 & V2 datasets and is on par with the best performance on the Kinetics-400 dataset.
- Score: 31.922001043405924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temporal modeling still remains challenging for action recognition in videos.
To mitigate this issue, this paper presents a new video architecture, termed as
Temporal Difference Network (TDN), with a focus on capturing multi-scale
temporal information for efficient action recognition. The core of our TDN is
to devise an efficient temporal module (TDM) by explicitly leveraging a
temporal difference operator, and systematically assess its effect on
short-term and long-term motion modeling. To fully capture temporal information
over the entire video, our TDN is established with a two-level difference
modeling paradigm. Specifically, for local motion modeling, temporal difference
over consecutive frames is used to supply 2D CNNs with finer motion pattern,
while for global motion modeling, temporal difference across segments is
incorporated to capture long-range structure for motion feature excitation. TDN
provides a simple and principled temporal modeling framework and could be
instantiated with the existing CNNs at a small extra computational cost. Our
TDN presents a new state of the art on the Something-Something V1 & V2 datasets
and is on par with the best performance on the Kinetics-400 dataset. In
addition, we conduct in-depth ablation studies and plot the visualization
results of our TDN, hopefully providing insightful analysis on temporal
difference modeling. We release the code at https://github.com/MCG-NJU/TDN.
Related papers
- DyFADet: Dynamic Feature Aggregation for Temporal Action Detection [70.37707797523723]
We build a novel dynamic feature aggregation (DFA) module that can adapt kernel weights and receptive fields at different timestamps.
Using DFA helps to develop a Dynamic TAD head (DyHead), which adaptively aggregates the multi-scale features with adjusted parameters.
DyFADet, a new dynamic TAD model, achieves promising performance on a series of challenging TAD benchmarks.
arXiv Detail & Related papers (2024-07-03T15:29:10Z) - Slow-Fast Visual Tempo Learning for Video-based Action Recognition [78.3820439082979]
Action visual tempo characterizes the dynamics and the temporal scale of an action.
Previous methods capture the visual tempo either by sampling raw videos with multiple rates, or by hierarchically sampling backbone features.
We propose a Temporal Correlation Module (TCM) to extract action visual tempo from low-level backbone features at single-layer remarkably.
arXiv Detail & Related papers (2022-02-24T14:20:04Z) - Exploring Motion and Appearance Information for Temporal Sentence
Grounding [52.01687915910648]
We propose a Motion-Appearance Reasoning Network (MARN) to solve temporal sentence grounding.
We develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations.
Our proposed MARN significantly outperforms previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-01-03T02:44:18Z) - Temporal Transformer Networks with Self-Supervision for Action
Recognition [13.00827959393591]
We introduce a startling Temporal Transformer Network with Self-supervision (TTSN)
TTSN consists of a temporal transformer module and a temporal sequence self-supervision module.
Our proposed TTSN is promising as it successfully achieves state-of-the-art performance for action recognition.
arXiv Detail & Related papers (2021-12-14T12:53:53Z) - TSI: Temporal Saliency Integration for Video Action Recognition [32.18535820790586]
We propose a Temporal Saliency Integration (TSI) block, which mainly contains a Salient Motion Excitation (SME) module and a Cross-scale Temporal Integration (CTI) module.
SME aims to highlight the motion-sensitive area through local-global motion modeling.
CTI is designed to perform multi-scale temporal modeling through a group of separate 1D convolutions respectively.
arXiv Detail & Related papers (2021-06-02T11:43:49Z) - Temporal Graph Modeling for Skeleton-based Action Recognition [25.788239844759246]
We propose a Temporal Enhanced Graph Convolutional Network (TE-GCN) to capture complex temporal dynamic.
The constructed temporal relation graph explicitly builds connections between semantically related temporal features.
Experiments are performed on two widely used large-scale datasets.
arXiv Detail & Related papers (2020-12-16T09:02:47Z) - MVFNet: Multi-View Fusion Network for Efficient Video Recognition [79.92736306354576]
We introduce a multi-view fusion (MVF) module to exploit video complexity using separable convolution for efficiency.
MVFNet can be thought of as a generalized video modeling framework.
arXiv Detail & Related papers (2020-12-13T06:34:18Z) - Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling.
Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z) - TAM: Temporal Adaptive Module for Video Recognition [60.83208364110288]
temporal adaptive module (bf TAM) generates video-specific temporal kernels based on its own feature map.
Experiments on Kinetics-400 and Something-Something datasets demonstrate that our TAM outperforms other temporal modeling methods consistently.
arXiv Detail & Related papers (2020-05-14T08:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.