GIMO: Gaze-Informed Human Motion Prediction in Context
- URL: http://arxiv.org/abs/2204.09443v1
- Date: Wed, 20 Apr 2022 13:17:39 GMT
- Title: GIMO: Gaze-Informed Human Motion Prediction in Context
- Authors: Yang Zheng, Yanchao Yang, Kaichun Mo, Jiaman Li, Tao Yu, Yebin Liu,
Karen Liu, Leonidas J. Guibas
- Abstract summary: We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
- Score: 75.52839760700833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting human motion is critical for assistive robots and AR/VR
applications, where the interaction with humans needs to be safe and
comfortable. Meanwhile, an accurate prediction depends on understanding both
the scene context and human intentions. Even though many works study
scene-aware human motion prediction, the latter is largely underexplored due to
the lack of ego-centric views that disclose human intent and the limited
diversity in motion and scenes. To reduce the gap, we propose a large-scale
human motion dataset that delivers high-quality body pose sequences, scene
scans, as well as ego-centric views with eye gaze that serves as a surrogate
for inferring human intent. By employing inertial sensors for motion capture,
our data collection is not tied to specific scenes, which further boosts the
motion dynamics observed from our subjects. We perform an extensive study of
the benefits of leveraging eye gaze for ego-centric human motion prediction
with various state-of-the-art architectures. Moreover, to realize the full
potential of gaze, we propose a novel network architecture that enables
bidirectional communication between the gaze and motion branches. Our network
achieves the top performance in human motion prediction on the proposed
dataset, thanks to the intent information from the gaze and the denoised gaze
feature modulated by the motion. The proposed dataset and our network
implementation will be publicly available.
Related papers
- Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion Prediction [46.309401205546656]
Real-world human movements are goal-directed and highly influenced by the spatial layout of their surrounding scenes.
We propose a Multi-Condition Latent Diffusion network (MCLD) that reformulates the human motion prediction task as a multi-condition joint inference problem.
Our network achieves significant improvements over the state-of-the-art methods on both realistic and diverse predictions.
arXiv Detail & Related papers (2024-05-29T02:21:31Z) - Multimodal Sense-Informed Prediction of 3D Human Motions [16.71099574742631]
This work introduces a novel multi-modal sense-informed motion prediction approach, which conditions high-fidelity generation on two modal information.
The gaze information is regarded as the human intention, and combined with both motion and scene features, we construct a ternary intention-aware attention to supervise the generation.
On two real-world benchmarks, the proposed method achieves state-of-the-art performance both in 3D human pose and trajectory prediction.
arXiv Detail & Related papers (2024-05-05T12:38:10Z) - EgoNav: Egocentric Scene-aware Human Trajectory Prediction [15.346096596482857]
Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons.
Such a robot needs to be able to constantly adapt to the surrounding scene based on egocentric vision, and predict the ego motion of the wearer.
In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings.
arXiv Detail & Related papers (2024-03-27T21:43:12Z) - GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction [10.982807572404166]
We present GazeMo - a novel gaze-guided denoising diffusion model to generate human motions.
Our method first uses a gaze encoder to extract the gaze and motion features respectively, then employs a graph attention network to fuse these features.
Our method outperforms the state-of-the-art methods by a large margin in terms of multi-modal final error.
arXiv Detail & Related papers (2023-12-19T12:10:12Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Socially and Contextually Aware Human Motion and Pose Forecasting [48.083060946226]
We propose a novel framework to tackle both tasks of human motion (or skeleton pose) and body skeleton pose forecasting.
We consider incorporating both scene and social contexts, as critical clues for this prediction task.
Our proposed framework achieves a superior performance compared to several baselines on two social datasets.
arXiv Detail & Related papers (2020-07-14T06:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.