AdaFuse: Adaptive Temporal Fusion Network for Efficient Action
Recognition
- URL: http://arxiv.org/abs/2102.05775v1
- Date: Wed, 10 Feb 2021 23:31:02 GMT
- Title: AdaFuse: Adaptive Temporal Fusion Network for Efficient Action
Recognition
- Authors: Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid
Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris
- Abstract summary: Temporal modelling is the key for efficient video action recognition.
We introduce an adaptive temporal fusion network, called AdaFuse, that fuses channels from current and past feature maps.
Our approach can achieve about 40% computation savings with comparable accuracy to state-of-the-art methods.
- Score: 68.70214388982545
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Temporal modelling is the key for efficient video action recognition. While
understanding temporal information can improve recognition accuracy for dynamic
actions, removing temporal redundancy and reusing past features can
significantly save computation leading to efficient action recognition. In this
paper, we introduce an adaptive temporal fusion network, called AdaFuse, that
dynamically fuses channels from current and past feature maps for strong
temporal modelling. Specifically, the necessary information from the historical
convolution feature maps is fused with current pruned feature maps with the
goal of improving both recognition accuracy and efficiency. In addition, we use
a skipping operation to further reduce the computation cost of action
recognition. Extensive experiments on Something V1 & V2, Jester and
Mini-Kinetics show that our approach can achieve about 40% computation savings
with comparable accuracy to state-of-the-art methods. The project page can be
found at https://mengyuest.github.io/AdaFuse/
Related papers
- TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning [6.329214318116305]
We propose a memory-efficient Temporal Difference Side Network ( TDS-CLIP) to balance knowledge transferring and temporal modeling.
Specifically, we introduce a Temporal Difference Adapter (TD-Adapter), which can effectively capture local temporal differences in motion features.
We also designed a Side Motion Enhancement Adapter (SME-Adapter) to guide the proposed side network in efficiently learning the rich motion information in videos.
arXiv Detail & Related papers (2024-08-20T09:40:08Z) - CAST: Cross-Attention in Space and Time for Video Action Recognition [8.785207228156098]
We propose a novel two-stream architecture called Cross-Attention in Space and Time (CAST)
CAST achieves a balanced spatial-temporal understanding of videos using only balanced input.
Our proposed mechanism enables spatial and temporal expert models to exchange information and make synergistic predictions.
arXiv Detail & Related papers (2023-11-30T18:58:51Z) - On the Importance of Spatial Relations for Few-shot Action Recognition [109.2312001355221]
In this paper, we investigate the importance of spatial relations and propose a more accurate few-shot action recognition method.
A novel Spatial Alignment Cross Transformer (SA-CT) learns to re-adjust the spatial relations and incorporates the temporal information.
Experiments reveal that, even without using any temporal information, the performance of SA-CT is comparable to temporal based methods on 3/4 benchmarks.
arXiv Detail & Related papers (2023-08-14T12:58:02Z) - TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z) - Learning from Temporal Gradient for Semi-supervised Action Recognition [15.45239134477737]
We introduce temporal gradient as an additional modality for more attentive feature extraction.
Our method achieves the state-of-the-art performance on three video action recognition benchmarks.
arXiv Detail & Related papers (2021-11-25T20:30:30Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Selective Feature Compression for Efficient Activity Recognition
Inference [26.43512549990624]
Selective Feature Compression (SFC) is an action recognition inference strategy that greatly increase model inference efficiency without any accuracy compromise.
Our experiments on Kinetics-400, UCF101 and ActivityNet show that SFC is able to reduce inference speed by 6-7x memory and dimension usage by 5-6x compared with the commonly used 30 crops dense procedure sampling.
arXiv Detail & Related papers (2021-04-01T00:54:51Z) - Temporal Memory Relation Network for Workflow Recognition from Surgical
Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns.
We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.