Self-supervised Single-view 3D Reconstruction via Semantic Consistency
- URL: http://arxiv.org/abs/2003.06473v1
- Date: Fri, 13 Mar 2020 20:29:01 GMT
- Title: Self-supervised Single-view 3D Reconstruction via Semantic Consistency
- Authors: Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani,
Ming-Hsuan Yang, Jan Kautz
- Abstract summary: We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
- Score: 142.71430568330172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We learn a self-supervised, single-view 3D reconstruction model that predicts
the 3D mesh shape, texture and camera pose of a target object with a collection
of 2D images and silhouettes. The proposed method does not necessitate 3D
supervision, manually annotated keypoints, multi-view images of an object or a
prior 3D template. The key insight of our work is that objects can be
represented as a collection of deformable parts, and each part is semantically
coherent across different instances of the same category (e.g., wings on birds
and wheels on cars). Therefore, by leveraging self-supervisedly learned part
segmentation of a large collection of category-specific images, we can
effectively enforce semantic consistency between the reconstructed meshes and
the original images. This significantly reduces ambiguities during joint
prediction of shape and camera pose of an object, along with texture. To the
best of our knowledge, we are the first to try and solve the single-view
reconstruction problem without a category-specific template mesh or semantic
keypoints. Thus our model can easily generalize to various object categories
without such labels, e.g., horses, penguins, etc. Through a variety of
experiments on several categories of deformable and rigid objects, we
demonstrate that our unsupervised method performs comparably if not better than
existing category-specific reconstruction methods learned with supervision.
Related papers
- EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild [79.71523320368388]
Our work aims to reconstruct hand-object interactions from a single-view image.
We first design a novel pipeline to estimate the underlying hand pose and object shape.
With the initial reconstruction, we employ a prior-guided optimization scheme.
arXiv Detail & Related papers (2024-11-21T16:33:35Z) - SAOR: Single-View Articulated Object Reconstruction [17.2716639564414]
We introduce SAOR, a novel approach for estimating the 3D shape, texture, and viewpoint of an articulated object from a single image captured in the wild.
Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons, SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors.
arXiv Detail & Related papers (2023-03-23T17:59:35Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance
Consistency [59.427074701985795]
Single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry.
We avoid all of these supervisions and hypotheses by leveraging explicitly the consistency between images of different object instances.
Our main contributions are two approaches to leverage cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; (ii) swap reconstruction, a loss enforcing consistency between instances having similar shape or texture.
arXiv Detail & Related papers (2022-04-21T17:47:35Z) - Multi-Category Mesh Reconstruction From Image Collections [90.24365811344987]
We present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture.
Our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision.
Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner.
arXiv Detail & Related papers (2021-10-21T16:32:31Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.