Related papers: SOAR: Self-Occluded Avatar Recovery from a Single Video In the Wild

SOAR: Self-Occluded Avatar Recovery from a Single Video In the Wild

URL: http://arxiv.org/abs/2410.23800v1
Date: Thu, 31 Oct 2024 10:35:59 GMT
Title: SOAR: Self-Occluded Avatar Recovery from a Single Video In the Wild
Authors: Zhuoyang Pan, Angjoo Kanazawa, Hang Gao,
Abstract summary: Self-occlusion is common when capturing people in the wild, where the performer do not follow predefined motion scripts. We introduce Self-Occluded Avatar Recovery (SOAR), a method for complete human reconstruction from partial observations where parts of the body are entirely unobserved.
Score: 30.728476070389707
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-occlusion is common when capturing people in the wild, where the performer do not follow predefined motion scripts. This challenges existing monocular human reconstruction systems that assume full body visibility. We introduce Self-Occluded Avatar Recovery (SOAR), a method for complete human reconstruction from partial observations where parts of the body are entirely unobserved. SOAR leverages structural normal prior and generative diffusion prior to address such an ill-posed reconstruction problem. For structural normal prior, we model human with an reposable surfel model with well-defined and easily readable shapes. For generative diffusion prior, we perform an initial reconstruction and refine it using score distillation. On various benchmarks, we show that SOAR performs favorably than state-of-the-art reconstruction and generation methods, and on-par comparing to concurrent works. Additional video results and code are available at https://soar-avatar.github.io/.

Related papers

Unify3D: An Augmented Holistic End-to-end Monocular 3D Human Reconstruction via Anatomy Shaping and Twins Negotiating [4.708237200844732]
This paper introduces a novel paradigm that treats human reconstruction as a holistic process. We propose a novel reconstruction framework consisting of two core components: the Anatomy Shaping Extraction module and the Twins Negotiating Reconstruction U-Net. We also propose a Comic Data Augmentation strategy and construct 15k+ 3D human scans to bolster model performance in more complex case input.
arXiv Detail & Related papers (2025-04-25T09:49:23Z)
GAS: Generative Avatar Synthesis from a Single Image [54.95198111659466]
We introduce a generalizable and unified framework to synthesize view-consistent and temporally coherent avatars from a single image. Our approach bridges this gap by combining the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model.
arXiv Detail & Related papers (2025-02-10T19:00:39Z)
WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction [51.22641018932625]
We present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis. Our method achieves SOTA performance in producing photorealistic renderings from the given monocular video.
arXiv Detail & Related papers (2025-02-03T04:43:41Z)
Pragmatist: Multiview Conditional Diffusion Models for High-Fidelity 3D Reconstruction from Unposed Sparse Views [23.94629999419033]
Inferring 3D structures from sparse, unposed observations is challenging due to its unconstrained nature.<n>Recent methods propose to predict implicit representations directly from unposed inputs in a data-driven manner, achieving promising results.<n>We propose conditional novel view synthesis, aiming to generate complete observations from limited input views to facilitate reconstruction.
arXiv Detail & Related papers (2024-12-11T14:30:24Z)
Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images [57.479339658504685]
"Divide and Fuse" strategy reconstructs human body parts independently before fusing them. Human Part Parametric Models (HPPM) independently reconstruct the mesh from a few shape and global-location parameters. A specially designed fusion module seamlessly integrates the reconstructed parts, even when only a few are visible.
arXiv Detail & Related papers (2024-07-12T21:29:11Z)
Stratified Avatar Generation from Sparse Observations [10.291918304187769]
Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences. In this paper, we are inspired by the inherent property of the kinematic tree defined in the Skinned Multi-Person Linear (SMPL) model. We propose a stratified approach to decouple the conventional full-body avatar reconstruction pipeline into two stages.
arXiv Detail & Related papers (2024-05-30T06:25:42Z)
SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion [35.73448283467723]
SiTH is a novel pipeline that integrates an image-conditioned diffusion model into a 3D mesh reconstruction workflow. We employ a powerful generative diffusion model to hallucinate unseen back-view appearance based on the input images. For the latter, we leverage skinned body meshes as guidance to recover full-body texture meshes from the input and back-view images.
arXiv Detail & Related papers (2023-11-27T14:22:07Z)
Humans in 4D: Reconstructing and Tracking Humans with Transformers [72.50856500760352]
We present an approach to reconstruct humans and track them over time. At the core of our approach, we propose a fully "transformerized" version of a network for human mesh recovery. This network, HMR 2.0, advances the state of the art and shows the capability to analyze unusual poses that have in the past been difficult to reconstruct from single images.
arXiv Detail & Related papers (2023-05-31T17:59:52Z)
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition [40.46674919612935]
We present Vid2Avatar, a method to learn human avatars from monocular in-the-wild videos. Our method does not require any groundtruth supervision or priors extracted from large datasets of clothed human scans. It solves the tasks of scene decomposition and surface reconstruction directly in 3D by modeling both the human and the background in the scene jointly.
arXiv Detail & Related papers (2023-02-22T18:59:17Z)
RealFusion: 360{\deg} Reconstruction of Any Object from a Single Image [98.46318529630109]
We consider the problem of reconstructing a full 360deg photographic model of an object from a single image. We take an off-the-self conditional image generator based on diffusion and engineer a prompt that encourages it to "dream up" novel views of the object.
arXiv Detail & Related papers (2023-02-21T13:25:35Z)
Making Reconstruction-based Method Great Again for Video Anomaly Detection [64.19326819088563]
Anomaly detection in videos is a significant yet challenging problem. Existing reconstruction-based methods rely on old-fashioned convolutional autoencoders. We propose a new autoencoder model for enhanced consecutive frame reconstruction.
arXiv Detail & Related papers (2023-01-28T01:57:57Z)
SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video [48.23424267130425]
SelfRecon recovers space-time coherent geometries from a monocular self-rotating human video. Explicit methods require a predefined template mesh for a given sequence, while the template is hard to acquire for a specific subject. Implicit methods support arbitrary topology and have high quality due to continuous geometric representation.
arXiv Detail & Related papers (2022-01-30T11:49:29Z)
Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image. A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently. Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.