Multi-Category Mesh Reconstruction From Image Collections
- URL: http://arxiv.org/abs/2110.11256v1
- Date: Thu, 21 Oct 2021 16:32:31 GMT
- Title: Multi-Category Mesh Reconstruction From Image Collections
- Authors: Alessandro Simoni, Stefano Pini, Roberto Vezzani, Rita Cucchiara
- Abstract summary: We present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture.
Our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision.
Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner.
- Score: 90.24365811344987
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recently, learning frameworks have shown the capability of inferring the
accurate shape, pose, and texture of an object from a single RGB image.
However, current methods are trained on image collections of a single category
in order to exploit specific priors, and they often make use of
category-specific 3D templates. In this paper, we present an alternative
approach that infers the textured mesh of objects combining a series of
deformable 3D models and a set of instance-specific deformation, pose, and
texture. Differently from previous works, our method is trained with images of
multiple object categories using only foreground masks and rough camera poses
as supervision. Without specific 3D templates, the framework learns
category-level models which are deformed to recover the 3D shape of the
depicted object. The instance-specific deformations are predicted independently
for each vertex of the learned 3D mesh, enabling the dynamic subdivision of the
mesh during the training process. Experiments show that the proposed framework
can distinguish between different object categories and learn category-specific
shape priors in an unsupervised manner. Predicted shapes are smooth and can
leverage from multiple steps of subdivision during the training process,
obtaining comparable or state-of-the-art results on two public datasets. Models
and code are publicly released.
Related papers
- SAOR: Single-View Articulated Object Reconstruction [17.2716639564414]
We introduce SAOR, a novel approach for estimating the 3D shape, texture, and viewpoint of an articulated object from a single image captured in the wild.
Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons, SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors.
arXiv Detail & Related papers (2023-03-23T17:59:35Z) - CA$^2$T-Net: Category-Agnostic 3D Articulation Transfer from Single
Image [41.70960551470232]
We present a neural network approach to transfer the motion from a single image of an articulated object to a rest-state (i.e., unarticulated) 3D model.
Our network learns to predict the object's pose, part segmentation, and corresponding motion parameters to reproduce the articulation shown in the input image.
arXiv Detail & Related papers (2023-01-05T18:57:12Z) - Topologically-Aware Deformation Fields for Single-View 3D Reconstruction [30.738926104317514]
We present a new framework for learning 3D object shapes and dense cross-object 3D correspondences from just an unaligned category-specific image collection.
The 3D shapes are generated implicitly as deformations to a category-specific signed distance field.
Our approach, dubbed TARS, achieves state-of-the-art reconstruction fidelity on several datasets.
arXiv Detail & Related papers (2022-05-12T17:59:59Z) - Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance
Consistency [59.427074701985795]
Single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry.
We avoid all of these supervisions and hypotheses by leveraging explicitly the consistency between images of different object instances.
Our main contributions are two approaches to leverage cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; (ii) swap reconstruction, a loss enforcing consistency between instances having similar shape or texture.
arXiv Detail & Related papers (2022-04-21T17:47:35Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Templates for 3D Object Pose Estimation Revisited: Generalization to New
Objects and Robustness to Occlusions [79.34847067293649]
We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions.
It relies on a small set of training objects to learn local object representations.
We are the first to show generalization without retraining on the LINEMOD and Occlusion-LINEMOD datasets.
arXiv Detail & Related papers (2022-03-31T17:50:35Z) - Template NeRF: Towards Modeling Dense Shape Correspondences from
Category-Specific Object Images [4.662583832063716]
We present neural radiance fields (NeRF) with templates, dubbed template-NeRF, for modeling appearance and geometry.
We generate dense shape correspondences simultaneously among objects of the same category from only multi-view posed images.
The learned dense correspondences can be readily used for various image-based tasks such as keypoint detection, part segmentation, and texture transfer.
arXiv Detail & Related papers (2021-11-08T02:16:48Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z) - Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.