Related papers: WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction

WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction

URL: http://arxiv.org/abs/2502.01045v1
Date: Mon, 03 Feb 2025 04:43:41 GMT
Title: WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction
Authors: Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo,
Abstract summary: We present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis.<n>Our method achieves SOTA performance in producing photorealistic renderings from the given monocular video.
Score: 51.22641018932625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis. Previous dynamic human avatar reconstruction methods typically require the input video to have full coverage of the observed human body. However, in daily practice, one typically has access to limited viewpoints, such as monocular front-view videos, making it a cumbersome task for previous methods to reconstruct the unseen parts of the human avatar. To tackle the issue, we present WonderHuman, which leverages 2D generative diffusion model priors to achieve high-quality, photorealistic reconstructions of dynamic human avatars from monocular videos, including accurate rendering of unseen body parts. Our approach introduces a Dual-Space Optimization technique, applying Score Distillation Sampling (SDS) in both canonical and observation spaces to ensure visual consistency and enhance realism in dynamic human reconstruction. Additionally, we present a View Selection strategy and Pose Feature Injection to enforce the consistency between SDS predictions and observed data, ensuring pose-dependent effects and higher fidelity in the reconstructed avatar. In the experiments, our method achieves SOTA performance in producing photorealistic renderings from the given monocular video, particularly for those challenging unseen parts. The project page and source code can be found at https://wyiguanw.github.io/WonderHuman/.

Related papers

HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers [60.86393841247567]
HumanRAM is a novel feed-forward approach for generalizable human reconstruction and animation from monocular or sparse human images.<n>Our approach integrates human reconstruction and animation into a unified framework by introducing explicit pose conditions.<n> Experiments show that HumanRAM significantly surpasses previous methods in terms of reconstruction accuracy, animation fidelity, and generalization performance on real-world datasets.
arXiv Detail & Related papers (2025-06-03T17:50:05Z)
AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion [56.12859795754579]
AdaHuman is a novel framework that generates high-fidelity animatable 3D avatars from a single in-the-wild image.<n>AdaHuman incorporates two key innovations: a pose-conditioned 3D joint diffusion model and a compositional 3DGS refinement module.
arXiv Detail & Related papers (2025-05-30T17:59:54Z)
ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos [18.73641648585445]
Recent neural rendering advances have enabled holistic human-scene reconstruction but require pre-calibrated camera and human poses. We introduce a novel unified framework that simultaneously performs camera tracking, human pose estimation and human-scene reconstruction in an online fashion. Specifically, we design a human deformation module to reconstruct the details and enhance generalizability to out-of-distribution poses faithfully.
arXiv Detail & Related papers (2025-04-17T17:59:02Z)
HumanGif: Single-View Human Diffusion with Generative Prior [25.516544735593087]
We propose HumanGif, a single-view human diffusion model with generative priors. Specifically, we formulate the single-view-based 3D human novel view and pose synthesis as a single-view-conditioned human diffusion process. We show that HumanGif achieves the best perceptual performance, with better generalizability for novel view and pose synthesis.
arXiv Detail & Related papers (2025-02-17T17:55:27Z)
Deformable 3D Gaussian Splatting for Animatable Human Avatars [50.61374254699761]
We propose a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as Splat masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware.
arXiv Detail & Related papers (2023-12-22T20:56:46Z)
Human Gaussian Splatting: Real-time Rendering of Animatable Avatars [8.719797382786464]
This work addresses the problem of real-time rendering of photorealistic human body avatars learned from multi-view videos. We propose an animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields. Our method achieves 1.5 dB PSNR improvement over the state-of-the-art on THuman4 dataset while being able to render in real-time (80 fps for 512x512 resolution)
arXiv Detail & Related papers (2023-11-28T12:05:41Z)
SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion [35.73448283467723]
SiTH is a novel pipeline that integrates an image-conditioned diffusion model into a 3D mesh reconstruction workflow. We employ a powerful generative diffusion model to hallucinate unseen back-view appearance based on the input images. For the latter, we leverage skinned body meshes as guidance to recover full-body texture meshes from the input and back-view images.
arXiv Detail & Related papers (2023-11-27T14:22:07Z)
Humans in 4D: Reconstructing and Tracking Humans with Transformers [72.50856500760352]
We present an approach to reconstruct humans and track them over time. At the core of our approach, we propose a fully "transformerized" version of a network for human mesh recovery. This network, HMR 2.0, advances the state of the art and shows the capability to analyze unusual poses that have in the past been difficult to reconstruct from single images.
arXiv Detail & Related papers (2023-05-31T17:59:52Z)
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition [40.46674919612935]
We present Vid2Avatar, a method to learn human avatars from monocular in-the-wild videos. Our method does not require any groundtruth supervision or priors extracted from large datasets of clothed human scans. It solves the tasks of scene decomposition and surface reconstruction directly in 3D by modeling both the human and the background in the scene jointly.
arXiv Detail & Related papers (2023-02-22T18:59:17Z)
AvatarGen: a 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is the first method that enables not only non-rigid human generation with diverse appearance but also full control over poses and viewpoints. To model non-rigid dynamics, it introduces a deformation network to learn pose-dependent deformations in the canonical space. Our method can generate animatable human avatars with high-quality appearance and geometry modeling, significantly outperforming previous 3D GANs.
arXiv Detail & Related papers (2022-08-01T01:27:02Z)
Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses. Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z)
Animatable Neural Radiance Fields from Monocular RGB Video [72.6101766407013]
We present animatable neural radiance fields for detailed human avatar creation from monocular videos. Our approach extends neural radiance fields to the dynamic scenes with human movements via introducing explicit pose-guided deformation. In experiments we show that the proposed approach achieves 1) implicit human geometry and appearance reconstruction with high-quality details, 2) photo-realistic rendering of the human from arbitrary views, and 3) animation of the human with arbitrary poses.
arXiv Detail & Related papers (2021-06-25T13:32:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.