GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion
Prediction
- URL: http://arxiv.org/abs/2312.12090v1
- Date: Tue, 19 Dec 2023 12:10:12 GMT
- Title: GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion
Prediction
- Authors: Haodong Yan and Zhiming Hu and Syn Schmitt and Andreas Bulling
- Abstract summary: Existing methods have synthesised body motion only from observed past motion.
We present GazeMoDiff, a novel gaze-guided denoising model to generate human motions.
Our work makes a first important step towards gaze-guided human motion prediction.
- Score: 11.997928273335129
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human motion prediction is important for virtual reality (VR) applications,
e.g., for realistic avatar animation. Existing methods have synthesised body
motion only from observed past motion, despite the fact that human gaze is
known to correlate strongly with body movements and is readily available in
recent VR headsets. We present GazeMoDiff -- a novel gaze-guided denoising
diffusion model to generate stochastic human motions. Our method first uses a
graph attention network to learn the spatio-temporal correlations between eye
gaze and human movements and to fuse them into cross-modal gaze-motion
features. These cross-modal features are injected into a noise prediction
network via a cross-attention mechanism and progressively denoised to generate
realistic human full-body motions. Experimental results on the MoGaze and GIMO
datasets demonstrate that our method outperforms the state-of-the-art methods
by a large margin in terms of average displacement error (15.03% on MoGaze and
9.20% on GIMO). We further conducted an online user study to compare our method
with state-of-the-art methods and the responses from 23 participants validate
that the motions generated by our method are more realistic than those from
other methods. Taken together, our work makes a first important step towards
gaze-guided stochastic human motion prediction and guides future work on this
important topic in VR research.
Related papers
- Gaze-Guided 3D Hand Motion Prediction for Detecting Intent in Egocentric Grasping Tasks [5.018156030818883]
We propose a novel approach that predicts future sequences of both hand poses and joint positions.
We use a vector-quantized variational autoencoder for robust hand pose encoding with an autoregressive generative transformer for effective hand motion sequence prediction.
arXiv Detail & Related papers (2025-03-27T15:26:41Z) - MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds [20.83684434910106]
We present MoManifold, a novel human motion prior, which models plausible human motion in continuous high-dimensional motion space.
Specifically, we propose novel decoupled joint acceleration to model human dynamics from existing limited motion data.
Extensive experiments demonstrate that MoManifold outperforms existing SOTAs as a prior in several downstream tasks.
arXiv Detail & Related papers (2024-09-01T15:00:16Z) - COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation [98.05046790227561]
COIN is a control-inpainting motion diffusion prior that enables fine-grained control to disentangle human and camera motions.
COIN outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation.
arXiv Detail & Related papers (2024-08-29T10:36:29Z) - Towards Practical Human Motion Prediction with LiDAR Point Clouds [15.715130864327792]
We propose textitLiDAR-HMP, the first single-LiDAR-based 3D human motion prediction approach.
LiDAR-HMP receives the raw LiDAR point cloud as input and forecasts future 3D human poses directly.
Our method achieves state-of-the-art performance on two public benchmarks and demonstrates remarkable robustness and efficacy in real-world deployments.
arXiv Detail & Related papers (2024-08-15T15:10:01Z) - GazeMotion: Gaze-guided Human Motion Forecasting [10.982807572404166]
We present GazeMotion, a novel method for human motion forecasting that combines information on past human poses with human eye gaze.
Inspired by evidence from behavioural sciences showing that human eye and body movements are closely coordinated, GazeMotion first predicts future eye gaze from past gaze, then fuses predicted future gaze and past poses into a gaze-pose graph, and finally uses a residual graph convolutional network to forecast body motion.
arXiv Detail & Related papers (2024-03-14T21:38:00Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction.
Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z) - Perpetual Motion: Generating Unbounded Human Motion [61.40259979876424]
We focus on long-term prediction; that is, generating long sequences of human motion that is plausible.
We propose a model to generate non-deterministic, textitever-changing, perpetual human motion.
We train this using a heavy-tailed function of the KL divergence of a white-noise Gaussian process, allowing latent sequence temporal dependency.
arXiv Detail & Related papers (2020-07-27T21:50:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.