Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve
- URL: http://arxiv.org/abs/2007.13034v1
- Date: Sun, 26 Jul 2020 00:08:37 GMT
- Title: Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve
- Authors: Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai
- Abstract summary: We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
- Score: 54.054575408582565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object recognition has seen significant progress in the image domain, with
focus primarily on 2D perception. We propose to leverage existing large-scale
datasets of 3D models to understand the underlying 3D structure of objects seen
in an image by constructing a CAD-based representation of the objects and their
poses. We present Mask2CAD, which jointly detects objects in real-world images
and for each detected object, optimizes for the most similar CAD model and its
pose. We construct a joint embedding space between the detected regions of an
image corresponding to an object and 3D CAD models, enabling retrieval of CAD
models for an input RGB image. This produces a clean, lightweight
representation of the objects in an image; this CAD-based representation
ensures a valid, efficient shape representation for applications such as
content creation or interactive scenarios, and makes a step towards
understanding the transformation of real-world imagery to a synthetic domain.
Experiments on real-world images from Pix3D demonstrate the advantage of our
approach in comparison to state of the art. To facilitate future research, we
additionally propose a new image-to-3D baseline on ScanNet which features
larger shape diversity, real-world occlusions, and challenging image views.
Related papers
- ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - Sparse Multi-Object Render-and-Compare [33.97243145891282]
Reconstructing 3D shape and pose of static objects from a single image is an essential task for various industries.
Directly predicting 3D shapes produces unrealistic, overly smoothed or tessellated shapes.
Retrieving CAD models ensures realistic shapes but requires robust and accurate alignment.
arXiv Detail & Related papers (2023-10-17T12:01:32Z) - Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval
from a Single Image [58.953160501596805]
We propose a novel approach towards constructing a joint embedding space between 2D images and 3D CAD models in a patch-wise fashion.
Our approach is more robust than state of the art in real-world scenarios without any exact CAD matches.
arXiv Detail & Related papers (2021-08-20T20:58:52Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - Geometric Processing for Image-based 3D Object Modeling [2.6397379133308214]
This article focuses on introducing the state-of-the-art methods of three major components of geometric processing: 1) geo-referencing; 2) Image dense matching 3) texture mapping.
The largely automated geometric processing of images in a 3D object reconstruction workflow, is becoming a critical part of the reality-based 3D modeling.
arXiv Detail & Related papers (2021-06-27T18:33:30Z) - Fully Understanding Generic Objects: Modeling, Segmentation, and
Reconstruction [33.95791350070165]
Inferring 3D structure of a generic object from a 2D image is a long-standing objective of computer vision.
We take an alternative approach with semi-supervised learning. That is, for a 2D image of a generic object, we decompose it into latent representations of category, shape and albedo.
We show that the complete shape and albedo modeling enables us to leverage real 2D images in both modeling and model fitting.
arXiv Detail & Related papers (2021-04-02T02:39:29Z) - GRF: Learning a General Radiance Field for 3D Representation and
Rendering [4.709764624933227]
We present a simple yet powerful neural network that implicitly represents and renders 3D objects and scenes only from 2D observations.
The network models 3D geometries as a general radiance field, which takes a set of 2D images with camera poses and intrinsics as input.
Our method can generate high-quality and realistic novel views for novel objects, unseen categories and challenging real-world scenes.
arXiv Detail & Related papers (2020-10-09T14:21:43Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - Self-Supervised 2D Image to 3D Shape Translation with Disentangled
Representations [92.89846887298852]
We present a framework to translate between 2D image views and 3D object shapes.
We propose SIST, a Self-supervised Image to Shape Translation framework.
arXiv Detail & Related papers (2020-03-22T22:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.