Spatiotemporal Deformable Models for Long-Term Complex Activity
Detection
- URL: http://arxiv.org/abs/2104.08194v1
- Date: Fri, 16 Apr 2021 16:05:34 GMT
- Title: Spatiotemporal Deformable Models for Long-Term Complex Activity
Detection
- Authors: Salman Khan and Fabio Cuzzolin
- Abstract summary: Long-term complex activity recognition can be crucial for autonomous systems such as cars and surgical robots.
Most current methods are designed to merely localise short-term action/activities or combinations of actions that only last for a few frames or seconds.
Our framework consists of three main building blocks: (i) action detection, (ii) the modelling of the deformable geometry of parts, and (iii) a sparsity mechanism.
- Score: 23.880673582575856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-term complex activity recognition and localisation can be crucial for
the decision-making process of several autonomous systems, such as smart cars
and surgical robots. Nonetheless, most current methods are designed to merely
localise short-term action/activities or combinations of atomic actions that
only last for a few frames or seconds. In this paper, we address the problem of
long-term complex activity detection via a novel deformable, spatiotemporal
parts-based model. Our framework consists of three main building blocks: (i)
action tube detection, (ii) the modelling of the deformable geometry of parts,
and (iii) a sparsity mechanism. Firstly, action tubes are detected in a series
of snippets using an action tube detector. Next, a new 3D deformable RoI
pooling layer is designed for learning the flexible, deformable geometry of the
constellation of parts. Finally, a sparsity strategy differentiates between
activated and deactivate features. We also provide temporal complex activity
annotation for the recently released ROAD autonomous driving dataset and the
SARAS-ESAD surgical action dataset, to validate our method and show the
adaptability of our framework to different domains. As they both contain long
videos portraying long-term activities they can be used as benchmarks for
future work in this area.
Related papers
- Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection [37.25262046781015]
Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos.
We propose a novel ConvTransformer network for action detection that efficiently captures both short-term and long-term temporal information.
Our network outperforms the state-of-the-art methods on all three datasets.
arXiv Detail & Related papers (2021-12-07T18:57:37Z) - Sequence-to-Sequence Modeling for Action Identification at High Temporal
Resolution [9.902223920743872]
We introduce a new action-recognition benchmark that includes subtle short-duration actions labeled at a high temporal resolution.
We show that current state-of-the-art models based on segmentation produce noisy predictions when applied to these data.
We propose a novel approach for high-resolution action identification, inspired by speech-recognition techniques.
arXiv Detail & Related papers (2021-11-03T21:06:36Z) - Spatio-Temporal Representation Factorization for Video-based Person
Re-Identification [55.01276167336187]
We propose Spatio-Temporal Representation Factorization module (STRF) for re-ID.
STRF is a flexible new computational unit that can be used in conjunction with most existing 3D convolutional neural network architectures for re-ID.
We empirically show that STRF improves performance of various existing baseline architectures while demonstrating new state-of-the-art results.
arXiv Detail & Related papers (2021-07-25T19:29:37Z) - Efficient Spatialtemporal Context Modeling for Action Recognition [42.30158166919919]
We propose a recurrent 3D criss-cross attention (RCCA-3D) module to model the dense long-range contextual information video for action recognition.
We model the relationship between points in the same line along the direction of horizon, vertical and depth at each time, which forms a 3D criss-cross structure.
Compared with the non-local method, the proposed RCCA-3D module reduces the number of parameters and FLOPs by 25% and 11% for the video context modeling.
arXiv Detail & Related papers (2021-03-20T14:48:12Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - Two-Stream AMTnet for Action Detection [12.581710073789848]
We propose a new deep neural network architecture for online action detection, termed ream to the original appearance one in AMTnet.
Two-Stream AMTnet exhibits superior action detection performance over state-of-the-art approaches on the standard action detection benchmarks.
arXiv Detail & Related papers (2020-04-03T12:16:45Z) - Spatio-Temporal Action Detection with Multi-Object Interaction [127.85524354900494]
In this paper, we study the S-temporal action detection problem with multi-object interaction.
We introduce a new dataset that is spatially annotated with action tubes containing multi-object interactions.
We propose an end-to-endtemporal action detection model that performs both spatial and temporal regression simultaneously.
arXiv Detail & Related papers (2020-04-01T00:54:56Z) - A Comprehensive Study on Temporal Modeling for Online Action Detection [50.558313106389335]
Online action detection (OAD) is a practical yet challenging task, which has attracted increasing attention in recent years.
This paper aims to provide a comprehensive study on temporal modeling for OAD including four meta types of temporal modeling methods.
We present several hybrid temporal modeling methods, which outperform the recent state-of-the-art methods with sizable margins on THUMOS-14 and TVSeries.
arXiv Detail & Related papers (2020-01-21T13:12:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.