HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations
- URL: http://arxiv.org/abs/2308.11261v1
- Date: Tue, 22 Aug 2023 08:07:12 GMT
- Title: HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations
- Authors: Sadegh Aliakbarian, Fatemeh Saleh, David Collier, Pashmina Cameron,
Darren Cosker
- Abstract summary: Head-Mounted Devices (HMDs) typically only provide a few input signals, such as head and hands 6-DoF.
We propose the first unified approach, HMD-NeMo, that addresses plausible and accurate full body motion generation even when the hands may be only partially visible.
- Score: 7.096701481970196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating both plausible and accurate full body avatar motion is the key to
the quality of immersive experiences in mixed reality scenarios. Head-Mounted
Devices (HMDs) typically only provide a few input signals, such as head and
hands 6-DoF. Recently, different approaches achieved impressive performance in
generating full body motion given only head and hands signal. However, to the
best of our knowledge, all existing approaches rely on full hand visibility.
While this is the case when, e.g., using motion controllers, a considerable
proportion of mixed reality experiences do not involve motion controllers and
instead rely on egocentric hand tracking. This introduces the challenge of
partial hand visibility owing to the restricted field of view of the HMD. In
this paper, we propose the first unified approach, HMD-NeMo, that addresses
plausible and accurate full body motion generation even when the hands may be
only partially visible. HMD-NeMo is a lightweight neural network that predicts
the full body motion in an online and real-time fashion. At the heart of
HMD-NeMo is the spatio-temporal encoder with novel temporally adaptable mask
tokens that encourage plausible motion in the absence of hand observations. We
perform extensive analysis of the impact of different components in HMD-NeMo
and introduce a new state-of-the-art on AMASS dataset through our evaluation.
Related papers
- EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars [56.56236652774294]
We propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video.
Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods.
arXiv Detail & Related papers (2024-09-22T22:50:27Z) - Real-Time Simulated Avatar from Head-Mounted Sensors [70.41580295721525]
We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets.
To synergize headset poses with cameras, we control a humanoid to track headset movement while analyzing input images to decide body movement.
When body parts are seen, the movements of hands and feet will be guided by the images; when unseen, the laws of physics guide the controller to generate plausible motion.
arXiv Detail & Related papers (2024-03-11T16:15:51Z) - HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable
Sparse Observations [28.452132601844717]
We propose HMD-Poser, the first unified approach to recover full-body motions using scalable sparse observations from HMD and body-worn IMUs.
A lightweight temporal-spatial feature learning network is proposed in HMD-Poser to guarantee that the model runs in real-time on HMDs.
Extensive experimental results on the challenging AMASS dataset show that HMD-Poser achieves new state-of-the-art results in both accuracy and real-time performance.
arXiv Detail & Related papers (2024-03-06T09:10:36Z) - HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - Physics-based Motion Retargeting from Sparse Inputs [73.94570049637717]
Commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose.
We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies.
We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available.
arXiv Detail & Related papers (2023-07-04T21:57:05Z) - Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking
Inputs with Diffusion Model [18.139630622759636]
We present AGRoL, a novel conditional diffusion model specifically designed to track full bodies given sparse upper-body tracking signals.
Our model is based on a simple multi-layer perceptron (MLP) architecture and a novel conditioning scheme for motion data.
Unlike common diffusion architectures, our compact architecture can run in real-time, making it suitable for online body-tracking applications.
arXiv Detail & Related papers (2023-04-17T19:35:13Z) - NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same
Action [24.67958500694608]
We introduce the Neural Motion (NeMo) field to represent the underlying 3D motions across a set of videos of the same action.
NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection.
arXiv Detail & Related papers (2022-12-28T01:40:32Z) - QuestSim: Human Motion Tracking from Sparse Sensors with Simulated
Avatars [80.05743236282564]
Real-time tracking of human body motion is crucial for immersive experiences in AR/VR.
We present a reinforcement learning framework that takes in sparse signals from an HMD and two controllers.
We show that a single policy can be robust to diverse locomotion styles, different body sizes, and novel environments.
arXiv Detail & Related papers (2022-09-20T00:25:54Z) - MGPSN: Motion-Guided Pseudo Siamese Network for Indoor Video Head
Detection [6.061552465738301]
We propose Motion-Guided Pseudo Siamese Network for Indoor Video Head Detection (MGPSN) to learn the robust head motion features.
MGPSN integrates spatial-temporal information on pixel level, guiding the model to extract effective head features.
It achieves state-of-the-art performance on the crowd Brainwash dataset.
arXiv Detail & Related papers (2021-10-07T09:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.