A Novel Online Action Detection Framework from Untrimmed Video Streams
- URL: http://arxiv.org/abs/2003.07734v1
- Date: Tue, 17 Mar 2020 14:11:24 GMT
- Title: A Novel Online Action Detection Framework from Untrimmed Video Streams
- Authors: Da-Hye Yoon, Nam-Gyu Cho, Seong-Whan Lee
- Abstract summary: We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses.
We augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions.
- Score: 19.895434487276578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online temporal action localization from an untrimmed video stream is a
challenging problem in computer vision. It is challenging because of i) in an
untrimmed video stream, more than one action instance may appear, including
background scenes, and ii) in online settings, only past and current
information is available. Therefore, temporal priors, such as the average
action duration of training data, which have been exploited by previous action
detection methods, are not suitable for this task because of the high
intra-class variation in human actions. We propose a novel online action
detection framework that considers actions as a set of temporally ordered
subclasses and leverages a future frame generation network to cope with the
limited information issue associated with the problem outlined above.
Additionally, we augment our data by varying the lengths of videos to allow the
proposed method to learn about the high intra-class variation in human actions.
We evaluate our method using two benchmark datasets, THUMOS'14 and ActivityNet,
for an online temporal action localization scenario and demonstrate that the
performance is comparable to state-of-the-art methods that have been proposed
for offline settings.
Related papers
- ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding [40.60371529725805]
We propose an efficient preliminary in-domain fine-tuning paradigm for feature adaptation.
We introduce Action-Cue-Injected Temporal Prompt Learning (ActPrompt), which injects action cues into the image encoder of VLM for better discovering action-sensitive patterns.
arXiv Detail & Related papers (2024-08-13T04:18:32Z) - ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos [35.371453530275666]
ActionSwitch is the first class-agnostic On-TAL framework capable of detecting overlapping actions.
By obviating the reliance on class information, ActionSwitch provides wider applicability to various situations.
arXiv Detail & Related papers (2024-07-17T20:07:05Z) - Open-Vocabulary Spatio-Temporal Action Detection [59.91046192096296]
Open-vocabulary-temporal action detection (OV-STAD) is an important fine-grained video understanding task.
OV-STAD requires training a model on a limited set of base classes with box and label supervision.
To better adapt the holistic VLM for the fine-grained action detection task, we carefully fine-tune it on the localized video region-text pairs.
arXiv Detail & Related papers (2024-05-17T14:52:47Z) - SimOn: A Simple Framework for Online Temporal Action Localization [51.27476730635852]
We propose a framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture.
Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous methods.
arXiv Detail & Related papers (2022-11-08T04:50:54Z) - A Circular Window-based Cascade Transformer for Online Action Detection [27.880350187125778]
We advocate a novel and efficient principle for online action detection.
It merely updates the latest and oldest historical representations in one window but reuses the intermediate ones, which have been already computed.
Based on this principle, we introduce a window-based cascade Transformer with a circular historical queue, where it conducts multi-stage attentions and cascade refinement on each window.
arXiv Detail & Related papers (2022-08-30T12:37:23Z) - AntPivot: Livestream Highlight Detection via Hierarchical Attention
Mechanism [64.70568612993416]
We formulate a new task Livestream Highlight Detection, discuss and analyze the difficulties listed above and propose a novel architecture AntPivot to solve this problem.
We construct a fully-annotated dataset AntHighlight to instantiate this task and evaluate the performance of our model.
arXiv Detail & Related papers (2022-06-10T05:58:11Z) - Deep Learning-based Action Detection in Untrimmed Videos: A Survey [20.11911785578534]
Most real-world videos are lengthy and untrimmed with sparse segments of interest.
The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions.
This paper provides an overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos.
arXiv Detail & Related papers (2021-09-30T22:42:25Z) - WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos [124.72839555467944]
We propose a weakly supervised framework that can be trained using only video-class labels.
We show that our method largely outperforms weakly-supervised baselines.
When strongly supervised, our method obtains the state-of-the-art results in the tasks of both online per-frame action recognition and online detection of action start.
arXiv Detail & Related papers (2020-06-05T23:08:41Z) - Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z) - ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.