Socially and Contextually Aware Human Motion and Pose Forecasting
- URL: http://arxiv.org/abs/2007.06843v1
- Date: Tue, 14 Jul 2020 06:12:13 GMT
- Title: Socially and Contextually Aware Human Motion and Pose Forecasting
- Authors: Vida Adeli, Ehsan Adeli, Ian Reid, Juan Carlos Niebles, Hamid
Rezatofighi
- Abstract summary: We propose a novel framework to tackle both tasks of human motion (or skeleton pose) and body skeleton pose forecasting.
We consider incorporating both scene and social contexts, as critical clues for this prediction task.
Our proposed framework achieves a superior performance compared to several baselines on two social datasets.
- Score: 48.083060946226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Smooth and seamless robot navigation while interacting with humans depends on
predicting human movements. Forecasting such human dynamics often involves
modeling human trajectories (global motion) or detailed body joint movements
(local motion). Prior work typically tackled local and global human movements
separately. In this paper, we propose a novel framework to tackle both tasks of
human motion (or trajectory) and body skeleton pose forecasting in a unified
end-to-end pipeline. To deal with this real-world problem, we consider
incorporating both scene and social contexts, as critical clues for this
prediction task, into our proposed framework. To this end, we first couple
these two tasks by i) encoding their history using a shared Gated Recurrent
Unit (GRU) encoder and ii) applying a metric as loss, which measures the source
of errors in each task jointly as a single distance. Then, we incorporate the
scene context by encoding a spatio-temporal representation of the video data.
We also include social clues by generating a joint feature representation from
motion and pose of all individuals from the scene using a social pooling layer.
Finally, we use a GRU based decoder to forecast both motion and skeleton pose.
We demonstrate that our proposed framework achieves a superior performance
compared to several baselines on two social datasets.
Related papers
- Staged Contact-Aware Global Human Motion Forecasting [7.930326095134298]
Scene-aware global human motion forecasting is critical for manifold applications, including virtual reality, robotics, and sports.
We propose a STAGed contact-aware global human motion forecasting STAG, a novel three-stage pipeline for predicting global human motion in a 3D environment.
STAG achieves a 1.8% and 16.2% overall improvement in pose and trajectory prediction, respectively, on the scene-aware GTA-IM dataset.
arXiv Detail & Related papers (2023-09-16T10:47:48Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - Contact-aware Human Motion Forecasting [87.04827994793823]
We tackle the task of scene-aware 3D human motion forecasting, which consists of predicting future human poses given a 3D scene and a past human motion.
Our approach outperforms the state-of-the-art human motion forecasting and human synthesis methods on both synthetic and real datasets.
arXiv Detail & Related papers (2022-10-08T07:53:19Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - Task-Generic Hierarchical Human Motion Prior using VAEs [44.356707509079044]
A deep generative model that describes human motions can benefit a wide range of fundamental computer vision and graphics tasks.
We present a method for learning complex human motions independent of specific tasks using a combined global and local latent space.
We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation.
arXiv Detail & Related papers (2021-06-07T23:11:42Z) - Scene-aware Generative Network for Human Motion Synthesis [125.21079898942347]
We propose a new framework, with the interaction between the scene and the human motion taken into account.
Considering the uncertainty of human motion, we formulate this task as a generative task.
We derive a GAN based learning approach, with discriminators to enforce the compatibility between the human motion and the contextual scene.
arXiv Detail & Related papers (2021-05-31T09:05:50Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.