Stratified Avatar Generation from Sparse Observations
- URL: http://arxiv.org/abs/2405.20786v2
- Date: Mon, 3 Jun 2024 05:36:47 GMT
- Title: Stratified Avatar Generation from Sparse Observations
- Authors: Han Feng, Wenchao Ma, Quankai Gao, Xianwei Zheng, Nan Xue, Huijuan Xu,
- Abstract summary: Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences.
In this paper, we are inspired by the inherent property of the kinematic tree defined in the Skinned Multi-Person Linear (SMPL) model.
We propose a stratified approach to decouple the conventional full-body avatar reconstruction pipeline into two stages.
- Score: 10.291918304187769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences in AR/VR applications. This task is challenging due to the limited input from Head Mounted Devices, which capture only sparse observations from the head and hands. Predicting the full-body avatars, particularly the lower body, from these sparse observations presents significant difficulties. In this paper, we are inspired by the inherent property of the kinematic tree defined in the Skinned Multi-Person Linear (SMPL) model, where the upper body and lower body share only one common ancestor node, bringing the potential of decoupled reconstruction. We propose a stratified approach to decouple the conventional full-body avatar reconstruction pipeline into two stages, with the reconstruction of the upper body first and a subsequent reconstruction of the lower body conditioned on the previous stage. To implement this straightforward idea, we leverage the latent diffusion model as a powerful probabilistic generator, and train it to follow the latent distribution of decoupled motions explored by a VQ-VAE encoder-decoder model. Extensive experiments on AMASS mocap dataset demonstrate our state-of-the-art performance in the reconstruction of full-body motions.
Related papers
- DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.
Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z) - SOAR: Self-Occluded Avatar Recovery from a Single Video In the Wild [30.728476070389707]
Self-occlusion is common when capturing people in the wild, where the performer do not follow predefined motion scripts.
We introduce Self-Occluded Avatar Recovery (SOAR), a method for complete human reconstruction from partial observations where parts of the body are entirely unobserved.
arXiv Detail & Related papers (2024-10-31T10:35:59Z) - Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging [8.819370643243012]
Coded Aperture Snapshot Spectral Imaging (CASSI) is a crucial technique for capturing three-dimensional multispectral images (MSIs)
Current state-of-the-art methods, predominantly end-to-end, face limitations in reconstructing high-frequency details.
This paper introduces a novel one-step Diffusion Probabilistic Model within a self-supervised adaptation framework for Snapshot Compressive Imaging.
arXiv Detail & Related papers (2024-09-11T17:02:10Z) - EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With
Kinematic Structure Priors [72.33767389878473]
We propose a transformer-based model EvoPose to introduce the human body prior knowledge for 3D human pose estimation effectively.
A Structural Priors Representation (SPR) module represents human priors as structural features carrying rich body patterns.
A Recursive Refinement (RR) module is applied to the 3D pose outputs by utilizing estimated results and further injects human priors simultaneously.
arXiv Detail & Related papers (2023-06-16T04:09:16Z) - BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion
Synthesis [14.331548412833513]
Mixed reality applications require tracking the user's full-body motion to enable an immersive experience.
We propose BoDiffusion -- a generative diffusion model for motion synthesis to tackle this under-constrained reconstruction problem.
We present a time and space conditioning scheme that allows BoDiffusion to leverage sparse tracking inputs while generating smooth and realistic full-body motion sequences.
arXiv Detail & Related papers (2023-04-21T16:39:05Z) - Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking
Inputs with Diffusion Model [18.139630622759636]
We present AGRoL, a novel conditional diffusion model specifically designed to track full bodies given sparse upper-body tracking signals.
Our model is based on a simple multi-layer perceptron (MLP) architecture and a novel conditioning scheme for motion data.
Unlike common diffusion architectures, our compact architecture can run in real-time, making it suitable for online body-tracking applications.
arXiv Detail & Related papers (2023-04-17T19:35:13Z) - UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture [70.59984501516084]
UnrealEgo is a new large-scale naturalistic dataset for egocentric 3D human pose estimation.
It is based on an advanced concept of eyeglasses equipped with two fisheye cameras that can be used in unconstrained environments.
We propose a new benchmark method with a simple but effective idea of devising a 2D keypoint estimation module for stereo inputs to improve 3D human pose estimation.
arXiv Detail & Related papers (2022-08-02T17:59:54Z) - Learned Vertex Descent: A New Direction for 3D Human Model Fitting [64.04726230507258]
We propose a novel optimization-based paradigm for 3D human model fitting on images and scans.
Our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art.
LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.
arXiv Detail & Related papers (2022-05-12T17:55:51Z) - LiP-Flow: Learning Inference-time Priors for Codec Avatars via
Normalizing Flows in Latent Space [90.74976459491303]
We introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.
A normalizing flow bridges the two representation spaces and transforms latent samples from one domain to another, allowing us to define a latent likelihood objective.
We show that our approach leads to an expressive and effective prior, capturing facial dynamics and subtle expressions better.
arXiv Detail & Related papers (2022-03-15T13:22:57Z) - Distribution-Aware Single-Stage Models for Multi-Person 3D Pose
Estimation [29.430404703883084]
We present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem.
The proposed DAS model simultaneously localizes person positions and their corresponding body joints in the 3D camera space in a one-pass manner.
Comprehensive experiments on benchmarks CMU Panoptic and MuPoTS-3D demonstrate the superior efficiency of the proposed DAS model.
arXiv Detail & Related papers (2022-03-15T07:30:27Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.