Sparse Multi-Object Render-and-Compare
- URL: http://arxiv.org/abs/2310.11184v1
- Date: Tue, 17 Oct 2023 12:01:32 GMT
- Title: Sparse Multi-Object Render-and-Compare
- Authors: Florian Langer, Ignas Budvytis, Roberto Cipolla
- Abstract summary: Reconstructing 3D shape and pose of static objects from a single image is an essential task for various industries.
Directly predicting 3D shapes produces unrealistic, overly smoothed or tessellated shapes.
Retrieving CAD models ensures realistic shapes but requires robust and accurate alignment.
- Score: 33.97243145891282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reconstructing 3D shape and pose of static objects from a single image is an
essential task for various industries, including robotics, augmented reality,
and digital content creation. This can be done by directly predicting 3D shape
in various representations or by retrieving CAD models from a database and
predicting their alignments. Directly predicting 3D shapes often produces
unrealistic, overly smoothed or tessellated shapes. Retrieving CAD models
ensures realistic shapes but requires robust and accurate alignment. Learning
to directly predict CAD model poses from image features is challenging and
inaccurate. Works, such as ROCA, compute poses from predicted normalised object
coordinates which can be more accurate but are susceptible to systematic
failure. SPARC demonstrates that following a ''render-and-compare'' approach
where a network iteratively improves upon its own predictions achieves accurate
alignments. Nevertheless, it performs individual CAD alignment for every object
detected in an image. This approach is slow when applied to many objects as the
time complexity increases linearly with the number of objects and can not learn
inter-object relations. Introducing a new network architecture Multi-SPARC we
learn to perform CAD model alignments for multiple detected objects jointly.
Compared to other single-view methods we achieve state-of-the-art performance
on the challenging real-world dataset ScanNet. By improving the instance
alignment accuracy from 31.8% to 40.3% we perform similar to state-of-the-art
multi-view methods.
Related papers
- ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance [76.7746870349809]
We present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models.
Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling.
arXiv Detail & Related papers (2024-03-19T03:39:43Z) - DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image [34.47379913018661]
We propose DiffCAD, the first weakly-supervised probabilistic approach to CAD retrieval and alignment from an RGB image.
We formulate this as a conditional generative task, leveraging diffusion to learn implicit probabilistic models capturing the shape, pose, and scale of CAD objects in an image.
Our approach is trained only on synthetic data, leveraging monocular depth and mask estimates to enable robust zero-shot adaptation to various real target domains.
arXiv Detail & Related papers (2023-11-30T15:10:21Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB
image [21.77811443143683]
Estimating 3D shapes and poses of static objects from a single image has important applications for robotics, augmented reality and digital content creation.
We demonstrate that a sparse, iterative, render-and-compare approach is more accurate and robust than relying on normalised object coordinates.
Our alignment procedure converges after just 3 iterations, improving the state-of-the-art performance on the challenging real-world dataset ScanNet.
arXiv Detail & Related papers (2022-10-03T16:02:10Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Leveraging Geometry for Shape Estimation from a Single RGB Image [25.003116148843525]
We show how keypoint matches from an RGB image to a rendered CAD model allow for more precise object pose predictions.
We also show that keypoint matches can not only be used to estimate the pose of an object, but also to modify the shape of the object itself.
The proposed geometric shape prediction improves the AP mesh over the state-of-the-art from 33.2 to 37.8 on seen objects and from 8.2 to 17.1 on unseen objects.
arXiv Detail & Related papers (2021-11-10T10:17:56Z) - Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval
from a Single Image [58.953160501596805]
We propose a novel approach towards constructing a joint embedding space between 2D images and 3D CAD models in a patch-wise fashion.
Our approach is more robust than state of the art in real-world scenarios without any exact CAD matches.
arXiv Detail & Related papers (2021-08-20T20:58:52Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.