DreaMo: Articulated 3D Reconstruction From A Single Casual Video
- URL: http://arxiv.org/abs/2312.02617v2
- Date: Thu, 7 Dec 2023 15:52:38 GMT
- Title: DreaMo: Articulated 3D Reconstruction From A Single Casual Video
- Authors: Tao Tu, Ming-Feng Li, Chieh Hubert Lin, Yen-Chi Cheng, Min Sun,
Ming-Hsuan Yang
- Abstract summary: We study articulated 3D shape reconstruction from a single and casually captured internet video, where the subject's view coverage is incomplete.
DreaMo shows promising quality in novel-view rendering, detailed articulated shape reconstruction, and skeleton generation.
- Score: 59.87221439498147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Articulated 3D reconstruction has valuable applications in various domains,
yet it remains costly and demands intensive work from domain experts. Recent
advancements in template-free learning methods show promising results with
monocular videos. Nevertheless, these approaches necessitate a comprehensive
coverage of all viewpoints of the subject in the input video, thus limiting
their applicability to casually captured videos from online sources. In this
work, we study articulated 3D shape reconstruction from a single and casually
captured internet video, where the subject's view coverage is incomplete. We
propose DreaMo that jointly performs shape reconstruction while solving the
challenging low-coverage regions with view-conditioned diffusion prior and
several tailored regularizations. In addition, we introduce a skeleton
generation strategy to create human-interpretable skeletons from the learned
neural bones and skinning weights. We conduct our study on a self-collected
internet video collection characterized by incomplete view coverage. DreaMo
shows promising quality in novel-view rendering, detailed articulated shape
reconstruction, and skeleton generation. Extensive qualitative and quantitative
studies validate the efficacy of each proposed component, and show existing
methods are unable to solve correct geometry due to the incomplete view
coverage.
Related papers
- COSMU: Complete 3D human shape from monocular unconstrained images [24.08612483445495]
We present a novel framework to reconstruct complete 3D human shapes from a given target image by leveraging monocular unconstrained images.
The objective of this work is to reproduce high-quality details in regions of the reconstructed human body that are not visible in the input target.
arXiv Detail & Related papers (2024-07-15T10:06:59Z) - MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild [32.6521941706907]
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos.
We first define a layered neural representation for the entire scene, composited by individual human and background models.
We learn the layered neural representation from videos via our layer-wise differentiable volume rendering.
arXiv Detail & Related papers (2024-06-03T17:59:57Z) - Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image.
We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space.
A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z) - Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction [51.3632308129838]
We present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction.
Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition.
We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing.
arXiv Detail & Related papers (2024-03-28T11:12:33Z) - A Fusion of Variational Distribution Priors and Saliency Map Replay for Continual 3D Reconstruction [1.2289361708127877]
Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images.
This task requires significant data acquisition to predict both visible and occluded portions of the shape.
We propose a continual learning-based 3D reconstruction method where our goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes.
arXiv Detail & Related papers (2023-08-17T06:48:55Z) - State of the Art in Dense Monocular Non-Rigid 3D Reconstruction [100.9586977875698]
3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics.
This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views.
arXiv Detail & Related papers (2022-10-27T17:59:53Z) - LASR: Learning Articulated Shape Reconstruction from a Monocular Video [97.92849567637819]
We introduce a template-free approach to learn 3D shapes from a single video.
Our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes.
arXiv Detail & Related papers (2021-05-06T21:41:11Z) - Learning monocular 3D reconstruction of articulated categories from
motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z) - Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes [87.91841050957714]
We present an unsupervised monocular framework for dense depth estimation of dynamic scenes.
We derive a training objective that aims to opportunistically preserve pairwise distances between reconstructed 3D points.
Our method provides promising results, demonstrating its capability of reconstructing 3D from challenging videos of non-rigid scenes.
arXiv Detail & Related papers (2020-12-31T16:02:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.