Human Activity Recognition based on Dynamic Spatio-Temporal Relations
- URL: http://arxiv.org/abs/2006.16132v1
- Date: Mon, 29 Jun 2020 15:49:34 GMT
- Title: Human Activity Recognition based on Dynamic Spatio-Temporal Relations
- Authors: Zhenyu Liu, Yaqiang Yao, Yan Liu, Yuening Zhu, Zhenchao Tao, Lei Wang,
Yuhong Feng
- Abstract summary: The description of a single human action and the modeling of the evolution of successive human actions are two major issues in human activity recognition.
We develop a method for human activity recognition that tackles these two issues.
- Score: 10.635134217802783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human activity, which usually consists of several actions, generally covers
interactions among persons and or objects. In particular, human actions involve
certain spatial and temporal relationships, are the components of more
complicated activity, and evolve dynamically over time. Therefore, the
description of a single human action and the modeling of the evolution of
successive human actions are two major issues in human activity recognition. In
this paper, we develop a method for human activity recognition that tackles
these two issues. In the proposed method, an activity is divided into several
successive actions represented by spatio temporal patterns, and the evolution
of these actions are captured by a sequential model. A refined comprehensive
spatio temporal graph is utilized to represent a single action, which is a
qualitative representation of a human action incorporating both the spatial and
temporal relations of the participant objects. Next, a discrete hidden Markov
model is applied to model the evolution of action sequences. Moreover, a fully
automatic partition method is proposed to divide a long-term human activity
video into several human actions based on variational objects and qualitative
spatial relations. Finally, a hierarchical decomposition of the human body is
introduced to obtain a discriminative representation for a single action.
Experimental results on the Cornell Activity Dataset demonstrate the efficiency
and effectiveness of the proposed approach, which will enable long videos of
human activity to be better recognized.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - in2IN: Leveraging individual Information to Generate Human INteractions [29.495166514135295]
We introduce in2IN, a novel diffusion model for human-human motion generation conditioned on individual descriptions.
We also propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D.
arXiv Detail & Related papers (2024-04-15T17:59:04Z) - THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - Human Activity Recognition Using Cascaded Dual Attention CNN and
Bi-Directional GRU Framework [3.3721926640077795]
Vision-based human activity recognition has emerged as one of the essential research areas in video analytics domain.
This paper presents a computationally efficient yet generic spatial-temporal cascaded framework that exploits the deep discriminative spatial and temporal features for human activity recognition.
The proposed framework attains an improvement in execution time up to 167 times in terms of frames per second as compared to most of the contemporary action recognition methods.
arXiv Detail & Related papers (2022-08-09T20:34:42Z) - Scene-aware Generative Network for Human Motion Synthesis [125.21079898942347]
We propose a new framework, with the interaction between the scene and the human motion taken into account.
Considering the uncertainty of human motion, we formulate this task as a generative task.
We derive a GAN based learning approach, with discriminators to enforce the compatibility between the human motion and the contextual scene.
arXiv Detail & Related papers (2021-05-31T09:05:50Z) - Human Interaction Recognition Framework based on Interacting Body Part
Attention [24.913372626903648]
We propose a novel framework that simultaneously considers both implicit and explicit representations of human interactions.
The proposed method captures the subtle difference between different interactions using interacting body part attention.
We validate the effectiveness of the proposed method using four widely used public datasets.
arXiv Detail & Related papers (2021-01-22T06:52:42Z) - LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
Activities [119.88381048477854]
We introduce the LEMMA dataset to provide a single home to address missing dimensions with meticulously designed settings.
We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities.
We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
arXiv Detail & Related papers (2020-07-31T00:13:54Z) - Simultaneous Learning from Human Pose and Object Cues for Real-Time
Activity Recognition [11.290467061493189]
We propose a novel approach to real-time human activity recognition, through simultaneously learning from observations of both human poses and objects involved in the human activity.
Our method outperforms previous methods and obtains real-time performance for human activity recognition with a processing speed of 104 Hz.
arXiv Detail & Related papers (2020-03-26T22:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.