Learning monocular 3D reconstruction of articulated categories from
motion
- URL: http://arxiv.org/abs/2103.16352v1
- Date: Tue, 30 Mar 2021 13:50:27 GMT
- Title: Learning monocular 3D reconstruction of articulated categories from
motion
- Authors: Filippos Kokkinos, Iasonas Kokkinos
- Abstract summary: Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
- Score: 39.811816510186475
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Monocular 3D reconstruction of articulated object categories is challenging
due to the lack of training data and the inherent ill-posedness of the problem.
In this work we use video self-supervision, forcing the consistency of
consecutive 3D reconstructions by a motion-based cycle loss. This largely
improves both optimization-based and learning-based 3D mesh reconstruction. We
further introduce an interpretable model of 3D template deformations that
controls a 3D surface through the displacement of a small number of local,
learnable handles. We formulate this operation as a structured layer relying on
mesh-laplacian regularization and show that it can be trained in an end-to-end
manner. We finally introduce a per-sample numerical optimisation approach that
jointly optimises over mesh displacements and cameras within a video, boosting
accuracy both for training and also as test time post-processing. While relying
exclusively on a small set of videos collected per category for supervision, we
obtain state-of-the-art reconstructions with diverse shapes, viewpoints and
textures for multiple articulated object categories.
Related papers
- Reconstructing Animatable Categories from Videos [65.14948977749269]
Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging.
We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time.
We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.
arXiv Detail & Related papers (2023-05-10T17:56:21Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z) - Learning Compositional Shape Priors for Few-Shot 3D Reconstruction [36.40776735291117]
We show that complex encoder-decoder architectures exploit large amounts of per-category data.
We propose three ways to learn a class-specific global shape prior, directly from data.
Experiments on the popular ShapeNet dataset show that our method outperforms a zero-shot baseline by over 40%.
arXiv Detail & Related papers (2021-06-11T14:55:49Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Online Adaptation for Consistent Mesh Reconstruction in the Wild [147.22708151409765]
We pose video-based reconstruction as a self-supervised online adaptation problem applied to any incoming test video.
We demonstrate that our algorithm recovers temporally consistent and reliable 3D structures from videos of non-rigid objects including those of animals captured in the wild.
arXiv Detail & Related papers (2020-12-06T07:22:27Z) - Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors [30.262308825799167]
We show that complex encoder-decoder architectures perform similarly to nearest-neighbor baselines in standard benchmarks.
We propose three approaches that efficiently integrate a class prior into a 3D reconstruction model.
arXiv Detail & Related papers (2020-04-14T04:53:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.