Simultaneous Learning from Human Pose and Object Cues for Real-Time
Activity Recognition
- URL: http://arxiv.org/abs/2004.03453v1
- Date: Thu, 26 Mar 2020 22:04:37 GMT
- Title: Simultaneous Learning from Human Pose and Object Cues for Real-Time
Activity Recognition
- Authors: Brian Reily, Qingzhao Zhu, Christopher Reardon, and Hao Zhang
- Abstract summary: We propose a novel approach to real-time human activity recognition, through simultaneously learning from observations of both human poses and objects involved in the human activity.
Our method outperforms previous methods and obtains real-time performance for human activity recognition with a processing speed of 104 Hz.
- Score: 11.290467061493189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time human activity recognition plays an essential role in real-world
human-centered robotics applications, such as assisted living and human-robot
collaboration. Although previous methods based on skeletal data to encode human
poses showed promising results on real-time activity recognition, they lacked
the capability to consider the context provided by objects within the scene and
in use by the humans, which can provide a further discriminant between human
activity categories. In this paper, we propose a novel approach to real-time
human activity recognition, through simultaneously learning from observations
of both human poses and objects involved in the human activity. We formulate
human activity recognition as a joint optimization problem under a unified
mathematical framework, which uses a regression-like loss function to integrate
human pose and object cues and defines structured sparsity-inducing norms to
identify discriminative body joints and object attributes. To evaluate our
method, we perform extensive experiments on two benchmark datasets and a
physical robot in a home assistance setting. Experimental results have shown
that our method outperforms previous methods and obtains real-time performance
for human activity recognition with a processing speed of 10^4 Hz.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Aligning Human Motion Generation with Human Perceptions [51.831338643012444]
We propose a data-driven approach to bridge the gap by introducing a large-scale human perceptual evaluation dataset, MotionPercept, and a human motion critic model, MotionCritic.
Our critic model offers a more accurate metric for assessing motion quality and could be readily integrated into the motion generation pipeline.
arXiv Detail & Related papers (2024-07-02T14:01:59Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots [119.55240471433302]
Habitat 3.0 is a simulation platform for studying collaborative human-robot tasks in home environments.
It addresses challenges in modeling complex deformable bodies and diversity in appearance and motion.
Human-in-the-loop infrastructure enables real human interaction with simulated robots via mouse/keyboard or a VR interface.
arXiv Detail & Related papers (2023-10-19T17:29:17Z) - Human Activity Recognition Using Cascaded Dual Attention CNN and
Bi-Directional GRU Framework [3.3721926640077795]
Vision-based human activity recognition has emerged as one of the essential research areas in video analytics domain.
This paper presents a computationally efficient yet generic spatial-temporal cascaded framework that exploits the deep discriminative spatial and temporal features for human activity recognition.
The proposed framework attains an improvement in execution time up to 167 times in terms of frames per second as compared to most of the contemporary action recognition methods.
arXiv Detail & Related papers (2022-08-09T20:34:42Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Scene-aware Generative Network for Human Motion Synthesis [125.21079898942347]
We propose a new framework, with the interaction between the scene and the human motion taken into account.
Considering the uncertainty of human motion, we formulate this task as a generative task.
We derive a GAN based learning approach, with discriminators to enforce the compatibility between the human motion and the contextual scene.
arXiv Detail & Related papers (2021-05-31T09:05:50Z) - Human-Robot Team Coordination with Dynamic and Latent Human Task
Proficiencies: Scheduling with Learning Curves [0.0]
We introduce a novel resource coordination that enables robots to explore the relative strengths and learning abilities of their human teammates.
We generate and evaluate a robust schedule while discovering the latest individual worker proficiency.
Results indicate that scheduling strategies favoring exploration tend to be beneficial for human-robot collaboration.
arXiv Detail & Related papers (2020-07-03T19:44:22Z) - Attention-Oriented Action Recognition for Real-Time Human-Robot
Interaction [11.285529781751984]
We propose an attention-oriented multi-level network framework to meet the need for real-time interaction.
Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution.
The other compact CNN receives the extracted skeleton sequence as input for action recognition.
arXiv Detail & Related papers (2020-07-02T12:41:28Z) - Human Activity Recognition based on Dynamic Spatio-Temporal Relations [10.635134217802783]
The description of a single human action and the modeling of the evolution of successive human actions are two major issues in human activity recognition.
We develop a method for human activity recognition that tackles these two issues.
arXiv Detail & Related papers (2020-06-29T15:49:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.