From Points to Multi-Object 3D Reconstruction
- URL: http://arxiv.org/abs/2012.11575v2
- Date: Wed, 7 Apr 2021 21:05:22 GMT
- Title: From Points to Multi-Object 3D Reconstruction
- Authors: Francis Engelmann, Konstantinos Rematas, Bastian Leibe, Vittorio
Ferrari
- Abstract summary: We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
- Score: 71.17445805257196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a method to detect and reconstruct multiple 3D objects from a
single RGB image. The key idea is to optimize for detection, alignment and
shape jointly over all objects in the RGB image, while focusing on realistic
and physically plausible reconstructions. To this end, we propose a keypoint
detector that localizes objects as center points and directly predicts all
object properties, including 9-DoF bounding boxes and 3D shapes -- all in a
single forward pass. The proposed method formulates 3D shape reconstruction as
a shape selection problem, i.e. it selects among exemplar shapes from a given
database. This makes it agnostic to shape representations, which enables a
lightweight reconstruction of realistic and visually-pleasing shapes based on
CAD-models, while the training objective is formulated around point clouds and
voxel representations. A collision-loss promotes non-intersecting objects,
further increasing the reconstruction realism. Given the RGB image, the
presented approach performs lightweight reconstruction in a single-stage, it is
real-time capable, fully differentiable and end-to-end trainable. Our
experiments compare multiple approaches for 9-DoF bounding box estimation,
evaluate the novel shape-selection mechanism and compare to recent methods in
terms of 3D bounding box estimation and 3D shape reconstruction quality.
Related papers
- 3D Surface Reconstruction in the Wild by Deforming Shape Priors from
Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem.
We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image.
Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z) - ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and
Pose Optimization [40.36229450208817]
We present ShAPO, a method for joint multi-object detection, 3D textured reconstruction, 6D object pose and size estimation.
Key to ShAPO is a single-shot pipeline to regress shape, appearance and pose latent codes along with the masks of each object instance.
Our method significantly out-performs all baselines on the NOCS dataset with an 8% absolute improvement in mAP for 6D pose estimation.
arXiv Detail & Related papers (2022-07-27T17:59:31Z) - Towards High-Fidelity Single-view Holistic Reconstruction of Indoor
Scenes [50.317223783035075]
We present a new framework to reconstruct holistic 3D indoor scenes from single-view images.
We propose an instance-aligned implicit function (InstPIFu) for detailed object reconstruction.
Our code and model will be made publicly available.
arXiv Detail & Related papers (2022-07-18T14:54:57Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z) - CoReNet: Coherent 3D scene reconstruction from a single RGB image [43.74240268086773]
We build on advances in deep learning to reconstruct the shape of a single object given only one RBG image as input.
We propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that enables building translation equivariant models; and (3) a reconstruction loss tailored to capture overall object geometry.
We reconstruct all objects jointly in one pass, producing a coherent reconstruction, where all objects live in a single consistent 3D coordinate frame relative to the camera and they do not intersect in 3D space.
arXiv Detail & Related papers (2020-04-27T17:53:07Z) - Implicit Functions in Feature Space for 3D Shape Reconstruction and
Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data.
IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.