Online Spatiotemporal Action Detection and Prediction via Causal
Representations
- URL: http://arxiv.org/abs/2008.13759v1
- Date: Mon, 31 Aug 2020 17:28:51 GMT
- Title: Online Spatiotemporal Action Detection and Prediction via Causal
Representations
- Authors: Gurkirt Singh
- Abstract summary: We start with the conversion of the traditional offline action detection pipeline into an online action tube detection system.
We explore the future prediction capabilities of such detection methods by extending an existing action tube into the future by regression.
Later, we seek to establish that online/temporalusal representations can achieve similar performance to that of offline three dimensional (3D) convolutional neural networks (CNNs) on various tasks.
- Score: 1.9798034349981157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this thesis, we focus on video action understanding problems from an
online and real-time processing point of view. We start with the conversion of
the traditional offline spatiotemporal action detection pipeline into an online
spatiotemporal action tube detection system. An action tube is a set of
bounding connected over time, which bounds an action instance in space and
time. Next, we explore the future prediction capabilities of such detection
methods by extending an existing action tube into the future by regression.
Later, we seek to establish that online/causal representations can achieve
similar performance to that of offline three dimensional (3D) convolutional
neural networks (CNNs) on various tasks, including action recognition, temporal
action segmentation and early prediction.
Related papers
- Harnessing Temporal Causality for Advanced Temporal Action Detection [53.654457142657236]
We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on benchmarks.
We ranked 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024, and 1st in the Moment Queries track at the Ego4D Challenge 2024.
arXiv Detail & Related papers (2024-07-25T06:03:02Z) - A Circular Window-based Cascade Transformer for Online Action Detection [27.880350187125778]
We advocate a novel and efficient principle for online action detection.
It merely updates the latest and oldest historical representations in one window but reuses the intermediate ones, which have been already computed.
Based on this principle, we introduce a window-based cascade Transformer with a circular historical queue, where it conducts multi-stage attentions and cascade refinement on each window.
arXiv Detail & Related papers (2022-08-30T12:37:23Z) - Temporally smooth online action detection using cycle-consistent future
anticipation [26.150144140790943]
We present a novel solution for online action detection by using a simple yet effective RNN-based networks calledFATSnet.
FATSnet consists of a module for anticipating the future that can be trained in an unsupervised manner.
We also propose a solution to relieve the performance loss when running RNN-based models on very long sequences.
arXiv Detail & Related papers (2021-04-16T11:00:19Z) - A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion
Compensation for Action Recognition in the EPIC-Kitchens Dataset [68.8204255655161]
Action recognition is one of the top-challenging research fields in computer vision.
ego-motion recorded sequences have become of important relevance.
The proposed method aims to cope with it by estimating this ego-motion or camera motion.
arXiv Detail & Related papers (2020-08-26T14:44:45Z) - TENet: Triple Excitation Network for Video Salient Object Detection [57.72696926903698]
We propose a simple yet effective approach, named Triple Excitation Network, to reinforce the training of video salient object detection (VSOD)
These excitation mechanisms are designed following the spirit of curriculum learning and aim to reduce learning at the beginning of training.
Our semi-curriculum learning design enables the first online strategy for VSOD, which allows exciting and boosting saliency responses during testing without re-training.
arXiv Detail & Related papers (2020-07-20T08:45:41Z) - Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z) - Two-Stream AMTnet for Action Detection [12.581710073789848]
We propose a new deep neural network architecture for online action detection, termed ream to the original appearance one in AMTnet.
Two-Stream AMTnet exhibits superior action detection performance over state-of-the-art approaches on the standard action detection benchmarks.
arXiv Detail & Related papers (2020-04-03T12:16:45Z) - Spatio-Temporal Action Detection with Multi-Object Interaction [127.85524354900494]
In this paper, we study the S-temporal action detection problem with multi-object interaction.
We introduce a new dataset that is spatially annotated with action tubes containing multi-object interactions.
We propose an end-to-endtemporal action detection model that performs both spatial and temporal regression simultaneously.
arXiv Detail & Related papers (2020-04-01T00:54:56Z) - A Novel Online Action Detection Framework from Untrimmed Video Streams [19.895434487276578]
We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses.
We augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions.
arXiv Detail & Related papers (2020-03-17T14:11:24Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.