Related papers: Reconstructing Animatable Categories from Videos

Reconstructing Animatable Categories from Videos

URL: http://arxiv.org/abs/2305.06351v1
Date: Wed, 10 May 2023 17:56:21 GMT
Title: Reconstructing Animatable Categories from Videos
Authors: Gengshan Yang and Chaoyang Wang and N Dinesh Reddy and Deva Ramanan
Abstract summary: Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging. We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time. We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.
Score: 65.14948977749269
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging, which are difficult to scale to arbitrary categories. Recently, differentiable rendering provides a pathway to obtain high-quality 3D models from monocular videos, but these are limited to rigid categories or single instances. We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time. Three key ideas are introduced to solve this problem: (1) specializing a skeleton to instances via optimization, (2) a method for latent space regularization that encourages shared structure across a category while maintaining instance details, and (3) using 3D background models to disentangle objects from the background. We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.

Related papers

4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos [15.063635374924209]
We propose 4D-Animal, a novel framework that reconstructs animatable 3D animals from videos without requiring sparse keypoint annotations.<n>Our approach introduces a dense feature network that maps 2D representations to SMAL parameters, enhancing both the efficiency and stability of the fitting process.
arXiv Detail & Related papers (2025-07-14T16:24:31Z)
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach [54.559847511280545]
We present a novel video generation framework that integrates 3-dimensional geometry and dynamic awareness.<n>To achieve this, we augment 2D videos with 3D point trajectories and align them in pixel space.<n>The resulting 3D-aware video dataset, PointVid, is then used to fine-tune a latent diffusion model.
arXiv Detail & Related papers (2025-02-05T21:49:06Z)
Wonderland: Navigating 3D Scenes from a Single Image [43.99037613068823]
We introduce a large-scale reconstruction model that leverages latents from a video diffusion model to predict 3D Gaussian Splattings of scenes in a feed-forward manner. We train the 3D reconstruction model to operate on the video latent space with a progressive learning strategy, enabling the efficient generation of high-quality, wide-scope, and generic 3D scenes.
arXiv Detail & Related papers (2024-12-16T18:58:17Z)
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale [42.67300636733286]
We present See3D, a visual-conditional multi-view diffusion model trained on large-scale Internet videos for open-world 3D creation. The model aims to Get 3D knowledge by solely Seeing the visual contents from the vast and rapidly growing video data. Our numerical and visual comparisons on single and sparse reconstruction benchmarks show that See3D, trained on cost-effective and scalable video data, achieves notable zero-shot and open-world generation capabilities.
arXiv Detail & Related papers (2024-12-09T17:44:56Z)
Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z)
CAT3D: Create Anything in 3D with Multi-View Diffusion Models [87.80820708758317]
We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation.
arXiv Detail & Related papers (2024-05-16T17:59:05Z)
DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations. We leverage the strong semantic prior within a 3D generative model to train a semantic decoder. Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z)
CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos [3.356334042188362]
We propose a novel reconstruction method that learns an animatable kinematic chain for any articulated object. Our approach is on par with state-of-the-art 3D surface reconstruction methods on various articulated object categories.
arXiv Detail & Related papers (2023-04-14T06:07:54Z)
BANMo: Building Animatable 3D Neural Models from Many Casual Videos [135.64291166057373]
We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape. Banmo builds high-fidelity, articulated 3D models from many monocular casual videos in a differentiable rendering framework. On real and synthetic datasets, BANMo shows higher-fidelity 3D reconstructions than prior works for humans and animals.
arXiv Detail & Related papers (2021-12-23T18:30:31Z)
DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only. Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species. We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z)
Learning monocular 3D reconstruction of articulated categories from motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss. We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles. We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z)
A Convolutional Architecture for 3D Model Embedding [1.3858051019755282]
We propose a deep learning architecture to handle 3D models as an input. We show that the embedding representation conveys semantic information that helps to deal with the similarity assessment of 3D objects.
arXiv Detail & Related papers (2021-03-05T15:46:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.