Learn to cycle: Time-consistent feature discovery for action recognition
- URL: http://arxiv.org/abs/2006.08247v2
- Date: Tue, 23 Jun 2020 14:06:36 GMT
- Title: Learn to cycle: Time-consistent feature discovery for action recognition
- Authors: Alexandros Stergiou and Ronald Poppe
- Abstract summary: Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
- Score: 83.43682368129072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalizing over temporal variations is a prerequisite for effective action
recognition in videos. Despite significant advances in deep neural networks, it
remains a challenge to focus on short-term discriminative motions in relation
to the overall performance of an action. We address this challenge by allowing
some flexibility in discovering relevant spatio-temporal features. We introduce
Squeeze and Recursion Temporal Gates (SRTG), an approach that favors inputs
with similar activations with potential temporal variations. We implement this
idea with a novel CNN block that uses an LSTM to encapsulate feature dynamics,
in conjunction with a temporal gate that is responsible for evaluating the
consistency of the discovered dynamics and the modeled features. We show
consistent improvement when using SRTG blocks, with only a minimal increase in
the number of GFLOPs. On Kinetics-700, we perform on par with current
state-of-the-art models, and outperform these on HACS, Moments in Time, UCF-101
and HMDB-51.
Related papers
- ARN-LSTM: A Multi-Stream Attention-Based Model for Action Recognition with Temporal Dynamics [6.6713480895907855]
ARN-LSTM is a novel action recognition model designed to address the challenge of simultaneously capturing spatial motion and temporal dynamics in action sequences.
Our proposed model integrates joint, motion, and temporal information through a multi-stream fusion architecture.
arXiv Detail & Related papers (2024-11-04T03:29:51Z) - DyFADet: Dynamic Feature Aggregation for Temporal Action Detection [70.37707797523723]
We build a novel dynamic feature aggregation (DFA) module that can adapt kernel weights and receptive fields at different timestamps.
Using DFA helps to develop a Dynamic TAD head (DyHead), which adaptively aggregates the multi-scale features with adjusted parameters.
DyFADet, a new dynamic TAD model, achieves promising performance on a series of challenging TAD benchmarks.
arXiv Detail & Related papers (2024-07-03T15:29:10Z) - Ultra-low Latency Spiking Neural Networks with Spatio-Temporal
Compression and Synaptic Convolutional Block [4.081968050250324]
Spiking neural networks (SNNs) have neuro-temporal information capability, low processing feature, and high biological plausibility.
Neuro-MNIST, CIFAR10-S, DVS128 gesture datasets need to aggregate individual events into frames with a higher temporal resolution for event stream classification.
We propose a processing-temporal compression method to aggregate individual events into a few time steps of NIST current to reduce the training and inference latency.
arXiv Detail & Related papers (2022-03-18T15:14:13Z) - Slow-Fast Visual Tempo Learning for Video-based Action Recognition [78.3820439082979]
Action visual tempo characterizes the dynamics and the temporal scale of an action.
Previous methods capture the visual tempo either by sampling raw videos with multiple rates, or by hierarchically sampling backbone features.
We propose a Temporal Correlation Module (TCM) to extract action visual tempo from low-level backbone features at single-layer remarkably.
arXiv Detail & Related papers (2022-02-24T14:20:04Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - TSI: Temporal Saliency Integration for Video Action Recognition [32.18535820790586]
We propose a Temporal Saliency Integration (TSI) block, which mainly contains a Salient Motion Excitation (SME) module and a Cross-scale Temporal Integration (CTI) module.
SME aims to highlight the motion-sensitive area through local-global motion modeling.
CTI is designed to perform multi-scale temporal modeling through a group of separate 1D convolutions respectively.
arXiv Detail & Related papers (2021-06-02T11:43:49Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion
Compensation for Action Recognition in the EPIC-Kitchens Dataset [68.8204255655161]
Action recognition is one of the top-challenging research fields in computer vision.
ego-motion recorded sequences have become of important relevance.
The proposed method aims to cope with it by estimating this ego-motion or camera motion.
arXiv Detail & Related papers (2020-08-26T14:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.