Learning to Discriminate Information for Online Action Detection:
Analysis and Application
- URL: http://arxiv.org/abs/2109.03393v2
- Date: Thu, 9 Sep 2021 00:59:19 GMT
- Title: Learning to Discriminate Information for Online Action Detection:
Analysis and Application
- Authors: Sumin Lee, Hyunjun Eun, Jinyoung Moon, Seokeon Choi, Yoonhyung Kim,
Chanho Jung, and Changick Kim
- Abstract summary: We propose a novel recurrent unit, named Information Discrimination Unit (IDU), which explicitly discriminates the information relevancy between an ongoing action and others.
We also present a new recurrent unit, called Information Integration Unit (IIU), for action anticipation.
Our IIU exploits the outputs from IDU as pseudo action labels as well as RGB frames to learn enriched features of observed actions effectively.
- Score: 32.4410197207228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online action detection, which aims to identify an ongoing action from a
streaming video, is an important subject in real-world applications. For this
task, previous methods use recurrent neural networks for modeling temporal
relations in an input sequence. However, these methods overlook the fact that
the input image sequence includes not only the action of interest but
background and irrelevant actions. This would induce recurrent units to
accumulate unnecessary information for encoding features on the action of
interest. To overcome this problem, we propose a novel recurrent unit, named
Information Discrimination Unit (IDU), which explicitly discriminates the
information relevancy between an ongoing action and others to decide whether to
accumulate the input information. This enables learning more discriminative
representations for identifying an ongoing action. In this paper, we further
present a new recurrent unit, called Information Integration Unit (IIU), for
action anticipation. Our IIU exploits the outputs from IDU as pseudo action
labels as well as RGB frames to learn enriched features of observed actions
effectively. In experiments on TVSeries and THUMOS-14, the proposed methods
outperform state-of-the-art methods by a significant margin in online action
detection and action anticipation. Moreover, we demonstrate the effectiveness
of the proposed units by conducting comprehensive ablation studies.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - ActAR: Actor-Driven Pose Embeddings for Video Action Recognition [12.043574473965318]
Human action recognition (HAR) in videos is one of the core tasks of video understanding.
We propose a new method that simultaneously learns to recognize efficiently human actions in the infrared spectrum.
arXiv Detail & Related papers (2022-04-19T05:12:24Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Information Elevation Network for Fast Online Action Detection [4.203274985072923]
Online action detection (OAD) is a task that receives video segments within a streaming video as inputs and identifies ongoing actions within them.
We introduce a novel information elevation unit (IEU) that lifts up and accumulate the past information relevant to the current action.
We design an efficient and effective OAD network using IEUs, called an information elevation network (IEN)
arXiv Detail & Related papers (2021-09-28T09:02:15Z) - Learning End-to-End Action Interaction by Paired-Embedding Data
Augmentation [10.857323240766428]
A new Interactive Action Translation (IAT) task aims to learn end-to-end action interaction from unlabeled interactive pairs.
We propose a Paired-Embedding (PE) method for effective and reliable data augmentation.
Experimental results on two datasets show impressive effects and broad application prospects of our method.
arXiv Detail & Related papers (2020-07-16T01:54:16Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z) - A Novel Online Action Detection Framework from Untrimmed Video Streams [19.895434487276578]
We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses.
We augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions.
arXiv Detail & Related papers (2020-03-17T14:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.