Online Action Detection in Streaming Videos with Time Buffers
- URL: http://arxiv.org/abs/2010.03016v1
- Date: Tue, 6 Oct 2020 20:43:50 GMT
- Title: Online Action Detection in Streaming Videos with Time Buffers
- Authors: Bowen Zhang, Hao Chen, Meng Wang, Yuanjun Xiong
- Abstract summary: We formulate the problem of online temporal action detection in live streaming videos.
The standard setting of the online action detection task requires immediate prediction after a new frame is captured.
We propose to adopt the problem setting that allows models to make use of the small buffer time' incurred by the delay.
- Score: 28.82710230196424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We formulate the problem of online temporal action detection in live
streaming videos, acknowledging one important property of live streaming videos
that there is normally a broadcast delay between the latest captured frame and
the actual frame viewed by the audience. The standard setting of the online
action detection task requires immediate prediction after a new frame is
captured. We illustrate that its lack of consideration of the delay is imposing
unnecessary constraints on the models and thus not suitable for this problem.
We propose to adopt the problem setting that allows models to make use of the
small `buffer time' incurred by the delay in live streaming videos. We design
an action start and end detection framework for this online with buffer setting
with two major components: flattened I3D and window-based suppression.
Experiments on three standard temporal action detection benchmarks under the
proposed setting demonstrate the effectiveness of the proposed framework. We
show that by having a suitable problem setting for this problem with
wide-applications, we can achieve much better detection accuracy than
off-the-shelf online action detection models.
Related papers
- Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Look at Adjacent Frames: Video Anomaly Detection without Offline
Training [21.334952965297667]
We propose a solution to detect anomalous events in videos without the need to train a model offline.
Specifically, our solution is based on a randomly-d multilayer perceptron that is optimized online to reconstruct video frames, pixel-by-pixel, from their frequency information.
An incremental learner is used to update parameters of the multilayer perceptron after observing each frame, thus allowing to detect anomalous events along the video stream.
arXiv Detail & Related papers (2022-07-27T21:18:58Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - FrameHopper: Selective Processing of Video Frames in Detection-driven
Real-Time Video Analytics [2.5119455331413376]
Detection-driven real-time video analytics require continuous detection of objects contained in the video frames.
Running these detectors on each and every frame in resource-constrained edge devices is computationally intensive.
We propose an off-line Reinforcement Learning (RL)-based algorithm to determine these skip-lengths.
arXiv Detail & Related papers (2022-03-22T07:05:57Z) - Fast Online Video Super-Resolution with Deformable Attention Pyramid [172.16491820970646]
Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV.
We propose a recurrent VSR architecture based on a deformable attention pyramid (DAP)
arXiv Detail & Related papers (2022-02-03T17:49:04Z) - Temporally smooth online action detection using cycle-consistent future
anticipation [26.150144140790943]
We present a novel solution for online action detection by using a simple yet effective RNN-based networks calledFATSnet.
FATSnet consists of a module for anticipating the future that can be trained in an unsupervised manner.
We also propose a solution to relieve the performance loss when running RNN-based models on very long sequences.
arXiv Detail & Related papers (2021-04-16T11:00:19Z) - Motion-blurred Video Interpolation and Extrapolation [72.3254384191509]
We present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner.
To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule.
arXiv Detail & Related papers (2021-03-04T12:18:25Z) - Towards Streaming Perception [70.68520310095155]
We present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception.
The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant.
We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations.
arXiv Detail & Related papers (2020-05-21T01:51:35Z) - Two-Stream AMTnet for Action Detection [12.581710073789848]
We propose a new deep neural network architecture for online action detection, termed ream to the original appearance one in AMTnet.
Two-Stream AMTnet exhibits superior action detection performance over state-of-the-art approaches on the standard action detection benchmarks.
arXiv Detail & Related papers (2020-04-03T12:16:45Z) - A Novel Online Action Detection Framework from Untrimmed Video Streams [19.895434487276578]
We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses.
We augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions.
arXiv Detail & Related papers (2020-03-17T14:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.