Temporally smooth online action detection using cycle-consistent future
anticipation
- URL: http://arxiv.org/abs/2104.08030v1
- Date: Fri, 16 Apr 2021 11:00:19 GMT
- Title: Temporally smooth online action detection using cycle-consistent future
anticipation
- Authors: Young Hwi Kim, Seonghyeon Nam, and Seon Joo Kim
- Abstract summary: We present a novel solution for online action detection by using a simple yet effective RNN-based networks calledFATSnet.
FATSnet consists of a module for anticipating the future that can be trained in an unsupervised manner.
We also propose a solution to relieve the performance loss when running RNN-based models on very long sequences.
- Score: 26.150144140790943
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many video understanding tasks work in the offline setting by assuming that
the input video is given from the start to the end. However, many real-world
problems require the online setting, making a decision immediately using only
the current and the past frames of videos such as in autonomous driving and
surveillance systems. In this paper, we present a novel solution for online
action detection by using a simple yet effective RNN-based networks called the
Future Anticipation and Temporally Smoothing network (FATSnet). The proposed
network consists of a module for anticipating the future that can be trained in
an unsupervised manner with the cycle-consistency loss, and another component
for aggregating the past and the future for temporally smooth frame-by-frame
predictions. We also propose a solution to relieve the performance loss when
running RNN-based models on very long sequences. Evaluations on TVSeries,
THUMOS14, and BBDB show that our method achieve the state-of-the-art
performances compared to the previous works on online action detection.
Related papers
- Temporal Sentence Grounding in Streaming Videos [60.67022943824329]
This paper aims to tackle a novel task - Temporal Sentence Grounding in Streaming Videos (TSGSV)
The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query.
We propose two novel methods: (1) a TwinNet structure that enables the model to learn about upcoming events; and (2) a language-guided feature compressor that eliminates redundant visual frames.
arXiv Detail & Related papers (2023-08-14T12:30:58Z) - SimOn: A Simple Framework for Online Temporal Action Localization [51.27476730635852]
We propose a framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture.
Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous methods.
arXiv Detail & Related papers (2022-11-08T04:50:54Z) - Interference Cancellation GAN Framework for Dynamic Channels [74.22393885274728]
We introduce an online training framework that can adapt to any changes in the channel.
Our framework significantly outperforms recent neural network models on highly dynamic channels.
arXiv Detail & Related papers (2022-08-17T02:01:18Z) - Unidirectional Video Denoising by Mimicking Backward Recurrent Modules
with Look-ahead Forward Ones [72.68740880786312]
Bidirectional recurrent networks (BiRNN) have exhibited appealing performance in several video restoration tasks.
BiRNN is intrinsically offline because it uses backward recurrent modules to propagate from the last to current frames.
We present a novel recurrent network consisting of forward and look-ahead recurrent modules for unidirectional video denoising.
arXiv Detail & Related papers (2022-04-12T05:33:15Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - Event and Activity Recognition in Video Surveillance for Cyber-Physical
Systems [0.0]
Long-term motion patterns alone play a pivotal role in the task of recognizing an event.
We show that the long-term motion patterns alone play a pivotal role in the task of recognizing an event.
Only the temporal features are exploited using a hybrid Convolutional Neural Network (CNN) + Recurrent Neural Network (RNN) architecture.
arXiv Detail & Related papers (2021-11-03T08:30:38Z) - Online Action Detection in Streaming Videos with Time Buffers [28.82710230196424]
We formulate the problem of online temporal action detection in live streaming videos.
The standard setting of the online action detection task requires immediate prediction after a new frame is captured.
We propose to adopt the problem setting that allows models to make use of the small buffer time' incurred by the delay.
arXiv Detail & Related papers (2020-10-06T20:43:50Z) - A Novel Online Action Detection Framework from Untrimmed Video Streams [19.895434487276578]
We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses.
We augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions.
arXiv Detail & Related papers (2020-03-17T14:11:24Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.