Related papers: Learning to Discriminate Information for Online Action Detection: Analysis and Application

Learning to Discriminate Information for Online Action Detection: Analysis and Application

URL: http://arxiv.org/abs/2109.03393v2
Date: Thu, 9 Sep 2021 00:59:19 GMT
Title: Learning to Discriminate Information for Online Action Detection: Analysis and Application
Authors: Sumin Lee, Hyunjun Eun, Jinyoung Moon, Seokeon Choi, Yoonhyung Kim, Chanho Jung, and Changick Kim
Abstract summary: We propose a novel recurrent unit, named Information Discrimination Unit (IDU), which explicitly discriminates the information relevancy between an ongoing action and others. We also present a new recurrent unit, called Information Integration Unit (IIU), for action anticipation. Our IIU exploits the outputs from IDU as pseudo action labels as well as RGB frames to learn enriched features of observed actions effectively.
Score: 32.4410197207228
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Online action detection, which aims to identify an ongoing action from a streaming video, is an important subject in real-world applications. For this task, previous methods use recurrent neural networks for modeling temporal relations in an input sequence. However, these methods overlook the fact that the input image sequence includes not only the action of interest but background and irrelevant actions. This would induce recurrent units to accumulate unnecessary information for encoding features on the action of interest. To overcome this problem, we propose a novel recurrent unit, named Information Discrimination Unit (IDU), which explicitly discriminates the information relevancy between an ongoing action and others to decide whether to accumulate the input information. This enables learning more discriminative representations for identifying an ongoing action. In this paper, we further present a new recurrent unit, called Information Integration Unit (IIU), for action anticipation. Our IIU exploits the outputs from IDU as pseudo action labels as well as RGB frames to learn enriched features of observed actions effectively. In experiments on TVSeries and THUMOS-14, the proposed methods outperform state-of-the-art methods by a significant margin in online action detection and action anticipation. Moreover, we demonstrate the effectiveness of the proposed units by conducting comprehensive ablation studies.

Related papers

Detecting Informative Channels: ActionFormer [3.1976901430982063]
ActionFormer gives us additional outputs which detect the border of the activities as well as the activity labels.<n>We analyze this extensively in terms of deep learning architectures.<n>Our method achieves substantial improvement of a 16.01% in terms of average mAP for inertial data.
arXiv Detail & Related papers (2025-05-27T05:29:02Z)
Object-Centric Latent Action Learning [70.3173534658611]
We propose a novel object-centric latent action learning approach, based on VideoSaur and LAPO. This method effectively disentangles causal agent-object interactions from irrelevant background noise and reduces the performance degradation caused by distractors. Our preliminary experiments with the Distracting Control Suite show that latent action pretraining based on object decompositions improve the quality of inferred latent actions by x2.7 and efficiency of downstream fine-tuning with a small set of labeled actions, increasing return by x2.6 on average.
arXiv Detail & Related papers (2025-02-13T11:27:05Z)
An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training. Previous research has focused on aligning sequences' visual and semantic spatial distributions. We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding. Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition. We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z)
ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries. We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations. Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z)
ActAR: Actor-Driven Pose Embeddings for Video Action Recognition [12.043574473965318]
Human action recognition (HAR) in videos is one of the core tasks of video understanding. We propose a new method that simultaneously learns to recognize efficiently human actions in the infrared spectrum.
arXiv Detail & Related papers (2022-04-19T05:12:24Z)
Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp. SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z)
Information Elevation Network for Fast Online Action Detection [4.203274985072923]
Online action detection (OAD) is a task that receives video segments within a streaming video as inputs and identifies ongoing actions within them. We introduce a novel information elevation unit (IEU) that lifts up and accumulate the past information relevant to the current action. We design an efficient and effective OAD network using IEUs, called an information elevation network (IEN)
arXiv Detail & Related papers (2021-09-28T09:02:15Z)
Learning End-to-End Action Interaction by Paired-Embedding Data Augmentation [10.857323240766428]
A new Interactive Action Translation (IAT) task aims to learn end-to-end action interaction from unlabeled interactive pairs. We propose a Paired-Embedding (PE) method for effective and reliable data augmentation. Experimental results on two datasets show impressive effects and broad application prospects of our method.
arXiv Detail & Related papers (2020-07-16T01:54:16Z)
Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top. Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition. We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z)
A Novel Online Action Detection Framework from Untrimmed Video Streams [19.895434487276578]
We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses. We augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions.
arXiv Detail & Related papers (2020-03-17T14:11:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.