Staged Contact-Aware Global Human Motion Forecasting
- URL: http://arxiv.org/abs/2309.08947v1
- Date: Sat, 16 Sep 2023 10:47:48 GMT
- Title: Staged Contact-Aware Global Human Motion Forecasting
- Authors: Luca Scofano, Alessio Sampieri, Elisabeth Schiele, Edoardo De Matteis,
Laura Leal-Taix\'e, Fabio Galasso
- Abstract summary: Scene-aware global human motion forecasting is critical for manifold applications, including virtual reality, robotics, and sports.
We propose a STAGed contact-aware global human motion forecasting STAG, a novel three-stage pipeline for predicting global human motion in a 3D environment.
STAG achieves a 1.8% and 16.2% overall improvement in pose and trajectory prediction, respectively, on the scene-aware GTA-IM dataset.
- Score: 7.930326095134298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene-aware global human motion forecasting is critical for manifold
applications, including virtual reality, robotics, and sports. The task
combines human trajectory and pose forecasting within the provided scene
context, which represents a significant challenge.
So far, only Mao et al. NeurIPS'22 have addressed scene-aware global motion,
cascading the prediction of future scene contact points and the global motion
estimation. They perform the latter as the end-to-end forecasting of future
trajectories and poses. However, end-to-end contrasts with the coarse-to-fine
nature of the task and it results in lower performance, as we demonstrate here
empirically.
We propose a STAGed contact-aware global human motion forecasting STAG, a
novel three-stage pipeline for predicting global human motion in a 3D
environment. We first consider the scene and the respective human interaction
as contact points. Secondly, we model the human trajectory forecasting within
the scene, predicting the coarse motion of the human body as a whole. The third
and last stage matches a plausible fine human joint motion to complement the
trajectory considering the estimated contacts.
Compared to the state-of-the-art (SoA), STAG achieves a 1.8% and 16.2%
overall improvement in pose and trajectory prediction, respectively, on the
scene-aware GTA-IM dataset. A comprehensive ablation study confirms the
advantages of staged modeling over end-to-end approaches. Furthermore, we
establish the significance of a newly proposed temporal counter called the
"time-to-go", which tells how long it is before reaching scene contact and
endpoints. Notably, STAG showcases its ability to generalize to datasets
lacking a scene and achieves a new state-of-the-art performance on CMU-Mocap,
without leveraging any social cues. Our code is released at:
https://github.com/L-Scofano/STAG
Related papers
- Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - Scene-aware Human Motion Forecasting via Mutual Distance Prediction [13.067687949642641]
We propose to model the human-scene interaction with the mutual distance between the human body and the scene.
Such mutual distances constrain both the local and global human motion, resulting in a whole-body motion constrained prediction.
We develop a pipeline with two sequential steps: predicting the future mutual distances first, followed by forecasting future human motion.
arXiv Detail & Related papers (2023-10-01T08:32:46Z) - Contact-aware Human Motion Forecasting [87.04827994793823]
We tackle the task of scene-aware 3D human motion forecasting, which consists of predicting future human poses given a 3D scene and a past human motion.
Our approach outperforms the state-of-the-art human motion forecasting and human synthesis methods on both synthetic and real datasets.
arXiv Detail & Related papers (2022-10-08T07:53:19Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Socially and Contextually Aware Human Motion and Pose Forecasting [48.083060946226]
We propose a novel framework to tackle both tasks of human motion (or skeleton pose) and body skeleton pose forecasting.
We consider incorporating both scene and social contexts, as critical clues for this prediction task.
Our proposed framework achieves a superior performance compared to several baselines on two social datasets.
arXiv Detail & Related papers (2020-07-14T06:12:13Z) - Long-term Human Motion Prediction with Scene Context [60.096118270451974]
We propose a novel three-stage framework for predicting human motion.
Our method first samples multiple human motion goals, then plans 3D human paths towards each goal, and finally predicts 3D human pose sequences following each path.
arXiv Detail & Related papers (2020-07-07T17:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.