DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension
- URL: http://arxiv.org/abs/2109.00033v1
- Date: Tue, 31 Aug 2021 18:33:55 GMT
- Title: DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension
- Authors: Roman Shapovalov, David Novotny, Benjamin Graham, Patrick Labatut,
Andrea Vedaldi
- Abstract summary: We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
- Score: 71.71234436165255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle the problem of monocular 3D reconstruction of articulated objects
like humans and animals. We contribute DensePose 3D, a method that can learn
such reconstructions in a weakly supervised fashion from 2D image annotations
only. This is in stark contrast with previous deformable reconstruction methods
that use parametric models such as SMPL pre-trained on a large dataset of 3D
object scans. Because it does not require 3D scans, DensePose 3D can be used
for learning a wide range of articulated categories such as different animal
species. The method learns, in an end-to-end fashion, a soft partition of a
given category-specific 3D template mesh into rigid parts together with a
monocular reconstruction network that predicts the part motions such that they
reproject correctly onto 2D DensePose-like surface annotations of the object.
The decomposition of the object into parts is regularized by expressing part
assignments as a combination of the smooth eigenfunctions of the
Laplace-Beltrami operator. We show significant improvements compared to
state-of-the-art non-rigid structure-from-motion baselines on both synthetic
and real data on categories of humans and animals.
Related papers
- Iterative Superquadric Recomposition of 3D Objects from Multiple Views [77.53142165205283]
We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views.
Our framework iteratively adds new superquadrics wherever the reconstruction error is high.
It provides consistently more accurate 3D reconstructions, even from images in the wild.
arXiv Detail & Related papers (2023-09-05T10:21:37Z) - MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices [78.20154723650333]
High-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation.
We introduce a novel multi-view RGBD dataset captured using a mobile device.
We obtain precise 3D ground-truth shape without relying on high-end 3D scanners.
arXiv Detail & Related papers (2023-03-03T14:02:50Z) - 3D Surface Reconstruction in the Wild by Deforming Shape Priors from
Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem.
We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image.
Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z) - Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable
Categories [80.30216777363057]
We introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets.
At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views.
Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines.
arXiv Detail & Related papers (2022-11-07T22:42:42Z) - GaussiGAN: Controllable Image Synthesis with 3D Gaussians from Unposed
Silhouettes [48.642181362172906]
We present an algorithm that learns a coarse 3D representation of objects from unposed multi-view 2D mask supervision.
In contrast to existing voxel-based methods for unposed object reconstruction, our approach learns to represent the generated shape and pose.
We show results on synthetic datasets with realistic lighting, and demonstrate object insertion with interactive posing.
arXiv Detail & Related papers (2021-06-24T17:47:58Z) - Learning monocular 3D reconstruction of articulated categories from
motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z) - AutoSweep: Recovering 3D Editable Objectsfrom a Single Photograph [54.701098964773756]
We aim to recover 3D objects with semantic parts and can be directly edited.
Our work makes an attempt towards recovering two types of primitive-shaped objects, namely, generalized cuboids and generalized cylinders.
Our algorithm can recover high quality 3D models and outperforms existing methods in both instance segmentation and 3D reconstruction.
arXiv Detail & Related papers (2020-05-27T12:16:24Z) - CoReNet: Coherent 3D scene reconstruction from a single RGB image [43.74240268086773]
We build on advances in deep learning to reconstruct the shape of a single object given only one RBG image as input.
We propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that enables building translation equivariant models; and (3) a reconstruction loss tailored to capture overall object geometry.
We reconstruct all objects jointly in one pass, producing a coherent reconstruction, where all objects live in a single consistent 3D coordinate frame relative to the camera and they do not intersect in 3D space.
arXiv Detail & Related papers (2020-04-27T17:53:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.