GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion
Prediction
- URL: http://arxiv.org/abs/2312.12090v1
- Date: Tue, 19 Dec 2023 12:10:12 GMT
- Title: GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion
Prediction
- Authors: Haodong Yan and Zhiming Hu and Syn Schmitt and Andreas Bulling
- Abstract summary: Existing methods have synthesised body motion only from observed past motion.
We present GazeMoDiff, a novel gaze-guided denoising model to generate human motions.
Our work makes a first important step towards gaze-guided human motion prediction.
- Score: 11.997928273335129
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human motion prediction is important for virtual reality (VR) applications,
e.g., for realistic avatar animation. Existing methods have synthesised body
motion only from observed past motion, despite the fact that human gaze is
known to correlate strongly with body movements and is readily available in
recent VR headsets. We present GazeMoDiff -- a novel gaze-guided denoising
diffusion model to generate stochastic human motions. Our method first uses a
graph attention network to learn the spatio-temporal correlations between eye
gaze and human movements and to fuse them into cross-modal gaze-motion
features. These cross-modal features are injected into a noise prediction
network via a cross-attention mechanism and progressively denoised to generate
realistic human full-body motions. Experimental results on the MoGaze and GIMO
datasets demonstrate that our method outperforms the state-of-the-art methods
by a large margin in terms of average displacement error (15.03% on MoGaze and
9.20% on GIMO). We further conducted an online user study to compare our method
with state-of-the-art methods and the responses from 23 participants validate
that the motions generated by our method are more realistic than those from
other methods. Taken together, our work makes a first important step towards
gaze-guided stochastic human motion prediction and guides future work on this
important topic in VR research.
Related papers
- Aligning Human Motion Generation with Human Perceptions [51.831338643012444]
We propose a data-driven approach to bridge the gap by introducing a large-scale human perceptual evaluation dataset, MotionPercept, and a human motion critic model, MotionCritic.
Our critic model offers a more accurate metric for assessing motion quality and could be readily integrated into the motion generation pipeline.
arXiv Detail & Related papers (2024-07-02T14:01:59Z) - A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos [10.149523817328921]
We introduce a novel method for simulating human gaze behavior.
Our approach uses a transformer-based reinforcement learning algorithm to train an agent that acts as a human observer.
arXiv Detail & Related papers (2024-04-10T21:14:33Z) - Gaze-guided Hand-Object Interaction Synthesis: Benchmark and Method [63.49140028965778]
We introduce the first Gaze-guided Hand-Object Interaction dataset, GazeHOI, and present a novel task for synthesizing gaze-guided hand-object interactions.
Our dataset, GazeHOI, features simultaneous 3D modeling of gaze, hand, and object interactions, comprising 479 sequences with an average duration of 19.1 seconds, 812 sub-sequences, and 33 objects of various sizes.
arXiv Detail & Related papers (2024-03-24T14:24:13Z) - GazeMotion: Gaze-guided Human Motion Forecasting [10.982807572404166]
We present GazeMotion, a novel method for human motion forecasting that combines information on past human poses with human eye gaze.
Inspired by evidence from behavioural sciences showing that human eye and body movements are closely coordinated, GazeMotion first predicts future eye gaze from past gaze, then fuses predicted future gaze and past poses into a gaze-pose graph, and finally uses a residual graph convolutional network to forecast body motion.
arXiv Detail & Related papers (2024-03-14T21:38:00Z) - AdvMT: Adversarial Motion Transformer for Long-term Human Motion
Prediction [2.837740438355204]
We present the Adversarial Motion Transformer (AdvMT), a novel model that integrates a transformer-based motion encoder and a temporal continuity discriminator.
With adversarial training, our method effectively reduces the unwanted artifacts in predictions, thereby ensuring the learning of more realistic and fluid human motions.
arXiv Detail & Related papers (2024-01-10T09:15:50Z) - Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses [11.545286742778977]
We first report a comprehensive analysis of eye-body coordination in various human-object and human-human interaction activities.
We then present Pose2Gaze, a eye-body coordination model that uses a convolutional neural network to extract features from head direction and full-body poses.
arXiv Detail & Related papers (2023-12-19T10:55:46Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered
Environments [69.03289331433874]
We present PACE, a novel method for modifying motion-captured virtual agents to interact with and move throughout dense, cluttered 3D scenes.
Our approach changes a given motion sequence of a virtual agent as needed to adjust to the obstacles and objects in the environment.
We compare our method with prior motion generating techniques and highlight the benefits of our method with a perceptual study and physical plausibility metrics.
arXiv Detail & Related papers (2023-03-24T19:49:08Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.