Reconstructing Animatable Categories from Videos
- URL: http://arxiv.org/abs/2305.06351v1
- Date: Wed, 10 May 2023 17:56:21 GMT
- Title: Reconstructing Animatable Categories from Videos
- Authors: Gengshan Yang and Chaoyang Wang and N Dinesh Reddy and Deva Ramanan
- Abstract summary: Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging.
We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time.
We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.
- Score: 65.14948977749269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Building animatable 3D models is challenging due to the need for 3D scans,
laborious registration, and manual rigging, which are difficult to scale to
arbitrary categories. Recently, differentiable rendering provides a pathway to
obtain high-quality 3D models from monocular videos, but these are limited to
rigid categories or single instances. We present RAC that builds category 3D
models from monocular videos while disentangling variations over instances and
motion over time. Three key ideas are introduced to solve this problem: (1)
specializing a skeleton to instances via optimization, (2) a method for latent
space regularization that encourages shared structure across a category while
maintaining instance details, and (3) using 3D background models to disentangle
objects from the background. We show that 3D models of humans, cats, and dogs
can be learned from 50-100 internet videos.
Related papers
- CAT3D: Create Anything in 3D with Multi-View Diffusion Models [87.80820708758317]
We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model.
CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation.
arXiv Detail & Related papers (2024-05-16T17:59:05Z) - Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly.
One crucial bottleneck of modeling animals is the limited availability of training data.
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular
Videos [3.356334042188362]
We propose a novel reconstruction method that learns an animatable kinematic chain for any articulated object.
Our approach is on par with state-of-the-art 3D surface reconstruction methods on various articulated object categories.
arXiv Detail & Related papers (2023-04-14T06:07:54Z) - BANMo: Building Animatable 3D Neural Models from Many Casual Videos [135.64291166057373]
We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape.
Banmo builds high-fidelity, articulated 3D models from many monocular casual videos in a differentiable rendering framework.
On real and synthetic datasets, BANMo shows higher-fidelity 3D reconstructions than prior works for humans and animals.
arXiv Detail & Related papers (2021-12-23T18:30:31Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z) - Learning monocular 3D reconstruction of articulated categories from
motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z) - A Convolutional Architecture for 3D Model Embedding [1.3858051019755282]
We propose a deep learning architecture to handle 3D models as an input.
We show that the embedding representation conveys semantic information that helps to deal with the similarity assessment of 3D objects.
arXiv Detail & Related papers (2021-03-05T15:46:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.