Related papers: Information Elevation Network for Fast Online Action Detection

Information Elevation Network for Fast Online Action Detection

URL: http://arxiv.org/abs/2109.13572v1
Date: Tue, 28 Sep 2021 09:02:15 GMT
Title: Information Elevation Network for Fast Online Action Detection
Authors: Sunah Min and Jinyoung Moon
Abstract summary: Online action detection (OAD) is a task that receives video segments within a streaming video as inputs and identifies ongoing actions within them. We introduce a novel information elevation unit (IEU) that lifts up and accumulate the past information relevant to the current action. We design an efficient and effective OAD network using IEUs, called an information elevation network (IEN)
Score: 4.203274985072923
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Online action detection (OAD) is a task that receives video segments within a streaming video as inputs and identifies ongoing actions within them. It is important to retain past information associated with a current action. However, long short-term memory (LSTM), a popular recurrent unit for modeling temporal information from videos, accumulates past information from the previous hidden and cell states and the extracted visual features at each timestep without considering the relationships between the past and current information. Consequently, the forget gate of the original LSTM can lose the accumulated information relevant to the current action because it determines which information to forget without considering the current action. We introduce a novel information elevation unit (IEU) that lifts up and accumulate the past information relevant to the current action in order to model the past information that is especially relevant to the current action. To the best of our knowledge, our IEN is the first attempt that considers the computational overhead for the practical use of OAD. Through ablation studies, we design an efficient and effective OAD network using IEUs, called an information elevation network (IEN). Our IEN uses visual features extracted by a fast action recognition network taking only RGB frames because extracting optical flows requires heavy computation overhead. On two OAD benchmark datasets, THUMOS-14 and TVSeries, our IEN outperforms state-of-the-art OAD methods using only RGB frames. Furthermore, on the THUMOS-14 dataset, our IEN outperforms the state-of-the-art OAD methods using two-stream features based on RGB frames and optical flows.

Related papers

Text-driven Online Action Detection [0.0]
We introduce TOAD: a Text-driven Online Action Detection architecture that supports zero-shot and few-shot learning. Our model achieves 82.46% mAP on the THUMOS14 dataset, outperforming existing methods.
arXiv Detail & Related papers (2025-01-23T10:06:52Z)
ARN-LSTM: A Multi-Stream Attention-Based Model for Action Recognition with Temporal Dynamics [6.6713480895907855]
ARN-LSTM is a novel action recognition model designed to address the challenge of simultaneously capturing spatial motion and temporal dynamics in action sequences. Our proposed model integrates joint, motion, and temporal information through a multi-stream fusion architecture.
arXiv Detail & Related papers (2024-11-04T03:29:51Z)
Harnessing Temporal Causality for Advanced Temporal Action Detection [53.654457142657236]
We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on benchmarks. We ranked 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024, and 1st in the Moment Queries track at the Ego4D Challenge 2024.
arXiv Detail & Related papers (2024-07-25T06:03:02Z)
On the Importance of Spatial Relations for Few-shot Action Recognition [109.2312001355221]
In this paper, we investigate the importance of spatial relations and propose a more accurate few-shot action recognition method. A novel Spatial Alignment Cross Transformer (SA-CT) learns to re-adjust the spatial relations and incorporates the temporal information. Experiments reveal that, even without using any temporal information, the performance of SA-CT is comparable to temporal based methods on 3/4 benchmarks.
arXiv Detail & Related papers (2023-08-14T12:58:02Z)
Event-Free Moving Object Segmentation from Moving Ego Vehicle [88.33470650615162]
Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving. Most segmentation methods leverage motion cues obtained from optical flow maps. We propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow.
arXiv Detail & Related papers (2023-04-28T23:43:10Z)
DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding. Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition. We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z)
Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD. In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames. In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches. The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z)
Learning to Discriminate Information for Online Action Detection: Analysis and Application [32.4410197207228]
We propose a novel recurrent unit, named Information Discrimination Unit (IDU), which explicitly discriminates the information relevancy between an ongoing action and others. We also present a new recurrent unit, called Information Integration Unit (IIU), for action anticipation. Our IIU exploits the outputs from IDU as pseudo action labels as well as RGB frames to learn enriched features of observed actions effectively.
arXiv Detail & Related papers (2021-09-08T01:51:51Z)
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition [68.70214388982545]
Temporal modelling is the key for efficient video action recognition. We introduce an adaptive temporal fusion network, called AdaFuse, that fuses channels from current and past feature maps. Our approach can achieve about 40% computation savings with comparable accuracy to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-10T23:31:02Z)
DS-Net: Dynamic Spatiotemporal Network for Video Salient Object Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information. We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.