DevilSight: Augmenting Monocular Human Avatar Reconstruction through a Virtual Perspective
- URL: http://arxiv.org/abs/2509.00403v1
- Date: Sat, 30 Aug 2025 08:06:16 GMT
- Title: DevilSight: Augmenting Monocular Human Avatar Reconstruction through a Virtual Perspective
- Authors: Yushuo Chen, Ruizhi Shao, Youxin Pang, Hongwen Zhang, Xinyi Wu, Rihui Wu, Yebin Liu,
- Abstract summary: We propose to leverage the advanced video generative model, Human4DiT, to generate the human motions from alternative perspective.<n>To ensure consistent reproduction of human motion, we inject the physical identity into the model through video fine-tuning.<n>For higher-resolution outputs with finer details, a patch-based denoising algorithm is employed.
- Score: 46.9104429232199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel framework to reconstruct human avatars from monocular videos. Recent approaches have struggled either to capture the fine-grained dynamic details from the input or to generate plausible details at novel viewpoints, which mainly stem from the limited representational capacity of the avatar model and insufficient observational data. To overcome these challenges, we propose to leverage the advanced video generative model, Human4DiT, to generate the human motions from alternative perspective as an additional supervision signal. This approach not only enriches the details in previously unseen regions but also effectively regularizes the avatar representation to mitigate artifacts. Furthermore, we introduce two complementary strategies to enhance video generation: To ensure consistent reproduction of human motion, we inject the physical identity into the model through video fine-tuning. For higher-resolution outputs with finer details, a patch-based denoising algorithm is employed. Experimental results demonstrate that our method outperforms recent state-of-the-art approaches and validate the effectiveness of our proposed strategies.
Related papers
- AniGaussian: Animatable Gaussian Avatar with Pose-guided Deformation [51.61117351997808]
We introduce an innovative pose guided deformation strategy that constrains the dynamic Gaussian avatar with SMPL pose guidance.<n>We incorporate rigid-based priors from previous works to enhance the dynamic transform capabilities of the Gaussian model.<n>Through extensive comparisons with existing methods, AniGaussian demonstrates superior performance in both qualitative result and quantitative metrics.
arXiv Detail & Related papers (2025-02-24T06:53:37Z) - WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction [51.22641018932625]
We present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis.<n>Our method achieves SOTA performance in producing photorealistic renderings from the given monocular video.
arXiv Detail & Related papers (2025-02-03T04:43:41Z) - GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion [5.49003371165534]
Photorealistic 3D head avatar reconstruction from recordings is challenging due to limited observations.<n>We introduce a multi-view head diffusion model, leveraging its priors to fill in missing regions and ensure view consistency.<n>We evaluate our method on the NeRSemble dataset, showing that our method outperforms previous state-of-the-art methods in novel view synthesis.
arXiv Detail & Related papers (2024-12-13T15:31:22Z) - High Quality Human Image Animation using Regional Supervision and Motion Blur Condition [97.97432499053966]
We leverage regional supervision for detailed regions to enhance face and hand faithfulness.
Second, we model the motion blur explicitly to further improve the appearance quality.
Third, we explore novel training strategies for high-resolution human animation to improve the overall fidelity.
arXiv Detail & Related papers (2024-09-29T06:46:31Z) - HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos [52.23323966700072]
We present a framework for acquiring human avatars that are attached with high-resolution physically-based material textures and mesh from monocular video.
Our method introduces a novel information fusion strategy to combine the information from the monocular video and synthesize virtual multi-view images.
Experiments show that our approach outperforms previous representations in terms of high fidelity, and this explicit result supports deployment on common triangulars.
arXiv Detail & Related papers (2024-05-18T11:49:09Z) - Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique.
We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z) - Thinking the Fusion Strategy of Multi-reference Face Reenactment [4.1509697008011175]
We show that simple extension by using multiple reference images significantly improves generation quality.
We show this by 1) conducting the reconstruction task on publicly available dataset, 2) conducting facial motion transfer on our original dataset which consists of multi-person's head movement video sequences, and 3) using a newly proposed evaluation metric to validate that our method achieves better quantitative results.
arXiv Detail & Related papers (2022-02-22T09:17:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.