SOAR: Self-Occluded Avatar Recovery from a Single Video In the Wild
- URL: http://arxiv.org/abs/2410.23800v1
- Date: Thu, 31 Oct 2024 10:35:59 GMT
- Title: SOAR: Self-Occluded Avatar Recovery from a Single Video In the Wild
- Authors: Zhuoyang Pan, Angjoo Kanazawa, Hang Gao,
- Abstract summary: Self-occlusion is common when capturing people in the wild, where the performer do not follow predefined motion scripts.
We introduce Self-Occluded Avatar Recovery (SOAR), a method for complete human reconstruction from partial observations where parts of the body are entirely unobserved.
- Score: 30.728476070389707
- License:
- Abstract: Self-occlusion is common when capturing people in the wild, where the performer do not follow predefined motion scripts. This challenges existing monocular human reconstruction systems that assume full body visibility. We introduce Self-Occluded Avatar Recovery (SOAR), a method for complete human reconstruction from partial observations where parts of the body are entirely unobserved. SOAR leverages structural normal prior and generative diffusion prior to address such an ill-posed reconstruction problem. For structural normal prior, we model human with an reposable surfel model with well-defined and easily readable shapes. For generative diffusion prior, we perform an initial reconstruction and refine it using score distillation. On various benchmarks, we show that SOAR performs favorably than state-of-the-art reconstruction and generation methods, and on-par comparing to concurrent works. Additional video results and code are available at https://soar-avatar.github.io/.
Related papers
- GAS: Generative Avatar Synthesis from a Single Image [54.95198111659466]
We introduce a generalizable and unified framework to synthesize view-consistent and temporally coherent avatars from a single image.
Our approach bridges this gap by combining the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model.
arXiv Detail & Related papers (2025-02-10T19:00:39Z) - WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction [51.22641018932625]
We present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis.
Our method achieves SOTA performance in producing photorealistic renderings from the given monocular video.
arXiv Detail & Related papers (2025-02-03T04:43:41Z) - Pragmatist: Multiview Conditional Diffusion Models for High-Fidelity 3D Reconstruction from Unposed Sparse Views [23.94629999419033]
Inferring 3D structures from sparse, unposed observations is challenging due to its unconstrained nature.
Recent methods propose to predict implicit representations directly from unposed inputs in a data-driven manner, achieving promising results.
We propose conditional novel view synthesis, aiming to generate complete observations from limited input views to facilitate reconstruction.
arXiv Detail & Related papers (2024-12-11T14:30:24Z) - Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images [57.479339658504685]
"Divide and Fuse" strategy reconstructs human body parts independently before fusing them.
Human Part Parametric Models (HPPM) independently reconstruct the mesh from a few shape and global-location parameters.
A specially designed fusion module seamlessly integrates the reconstructed parts, even when only a few are visible.
arXiv Detail & Related papers (2024-07-12T21:29:11Z) - Stratified Avatar Generation from Sparse Observations [10.291918304187769]
Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences.
In this paper, we are inspired by the inherent property of the kinematic tree defined in the Skinned Multi-Person Linear (SMPL) model.
We propose a stratified approach to decouple the conventional full-body avatar reconstruction pipeline into two stages.
arXiv Detail & Related papers (2024-05-30T06:25:42Z) - SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion [35.73448283467723]
SiTH is a novel pipeline that integrates an image-conditioned diffusion model into a 3D mesh reconstruction workflow.
We employ a powerful generative diffusion model to hallucinate unseen back-view appearance based on the input images.
For the latter, we leverage skinned body meshes as guidance to recover full-body texture meshes from the input and back-view images.
arXiv Detail & Related papers (2023-11-27T14:22:07Z) - Humans in 4D: Reconstructing and Tracking Humans with Transformers [72.50856500760352]
We present an approach to reconstruct humans and track them over time.
At the core of our approach, we propose a fully "transformerized" version of a network for human mesh recovery.
This network, HMR 2.0, advances the state of the art and shows the capability to analyze unusual poses that have in the past been difficult to reconstruct from single images.
arXiv Detail & Related papers (2023-05-31T17:59:52Z) - RealFusion: 360{\deg} Reconstruction of Any Object from a Single Image [98.46318529630109]
We consider the problem of reconstructing a full 360deg photographic model of an object from a single image.
We take an off-the-self conditional image generator based on diffusion and engineer a prompt that encourages it to "dream up" novel views of the object.
arXiv Detail & Related papers (2023-02-21T13:25:35Z) - SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video [48.23424267130425]
SelfRecon recovers space-time coherent geometries from a monocular self-rotating human video.
Explicit methods require a predefined template mesh for a given sequence, while the template is hard to acquire for a specific subject.
Implicit methods support arbitrary topology and have high quality due to continuous geometric representation.
arXiv Detail & Related papers (2022-01-30T11:49:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.