Leveraging Geometry for Shape Estimation from a Single RGB Image
- URL: http://arxiv.org/abs/2111.05615v1
- Date: Wed, 10 Nov 2021 10:17:56 GMT
- Title: Leveraging Geometry for Shape Estimation from a Single RGB Image
- Authors: Florian Langer, Ignas Budvytis, Roberto Cipolla
- Abstract summary: We show how keypoint matches from an RGB image to a rendered CAD model allow for more precise object pose predictions.
We also show that keypoint matches can not only be used to estimate the pose of an object, but also to modify the shape of the object itself.
The proposed geometric shape prediction improves the AP mesh over the state-of-the-art from 33.2 to 37.8 on seen objects and from 8.2 to 17.1 on unseen objects.
- Score: 25.003116148843525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting 3D shapes and poses of static objects from a single RGB image is
an important research area in modern computer vision. Its applications range
from augmented reality to robotics and digital content creation. Typically this
task is performed through direct object shape and pose predictions which is
inaccurate. A promising research direction ensures meaningful shape predictions
by retrieving CAD models from large scale databases and aligning them to the
objects observed in the image. However, existing work does not take the object
geometry into account, leading to inaccurate object pose predictions,
especially for unseen objects. In this work we demonstrate how cross-domain
keypoint matches from an RGB image to a rendered CAD model allow for more
precise object pose predictions compared to ones obtained through direct
predictions. We further show that keypoint matches can not only be used to
estimate the pose of an object, but also to modify the shape of the object
itself. This is important as the accuracy that can be achieved with object
retrieval alone is inherently limited to the available CAD models. Allowing
shape adaptation bridges the gap between the retrieved CAD model and the
observed shape. We demonstrate our approach on the challenging Pix3D dataset.
The proposed geometric shape prediction improves the AP mesh over the
state-of-the-art from 33.2 to 37.8 on seen objects and from 8.2 to 17.1 on
unseen objects. Furthermore, we demonstrate more accurate shape predictions
without closely matching CAD models when following the proposed shape
adaptation. Code is publicly available at
https://github.com/florianlanger/leveraging_geometry_for_shape_estimation .
Related papers
- Sparse Multi-Object Render-and-Compare [33.97243145891282]
Reconstructing 3D shape and pose of static objects from a single image is an essential task for various industries.
Directly predicting 3D shapes produces unrealistic, overly smoothed or tessellated shapes.
Retrieving CAD models ensures realistic shapes but requires robust and accurate alignment.
arXiv Detail & Related papers (2023-10-17T12:01:32Z) - ShapeShift: Superquadric-based Object Pose Estimation for Robotic
Grasping [85.38689479346276]
Current techniques heavily rely on a reference 3D object, limiting their generalizability and making it expensive to expand to new object categories.
This paper proposes ShapeShift, a superquadric-based framework for object pose estimation that predicts the object's pose relative to a primitive shape which is fitted to the object.
arXiv Detail & Related papers (2023-04-10T20:55:41Z) - OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD
Models [51.68715543630427]
OnePose relies on detecting repeatable image keypoints and is thus prone to failure on low-textured objects.
We propose a keypoint-free pose estimation pipeline to remove the need for repeatable keypoint detection.
A 2D-3D matching network directly establishes 2D-3D correspondences between the query image and the reconstructed point-cloud model.
arXiv Detail & Related papers (2023-01-18T17:47:13Z) - Generative Category-Level Shape and Pose Estimation with Semantic
Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image.
To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space.
We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z) - SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB
image [21.77811443143683]
Estimating 3D shapes and poses of static objects from a single image has important applications for robotics, augmented reality and digital content creation.
We demonstrate that a sparse, iterative, render-and-compare approach is more accurate and robust than relying on normalised object coordinates.
Our alignment procedure converges after just 3 iterations, improving the state-of-the-art performance on the challenging real-world dataset ScanNet.
arXiv Detail & Related papers (2022-10-03T16:02:10Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval
from a Single Image [58.953160501596805]
We propose a novel approach towards constructing a joint embedding space between 2D images and 3D CAD models in a patch-wise fashion.
Our approach is more robust than state of the art in real-world scenarios without any exact CAD matches.
arXiv Detail & Related papers (2021-08-20T20:58:52Z) - From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z) - 3D Object Detection and Pose Estimation of Unseen Objects in Color
Images with Local Surface Embeddings [35.769234123059086]
We present an approach for detecting and estimating the 3D poses of objects in images that requires only an untextured CAD model.
Our approach combines Deep Learning and 3D geometry: It relies on an embedding of local 3D geometry to match the CAD models to the input images.
We show that we can use Mask-RCNN in a class-agnostic way to detect the new objects without retraining and thus drastically limit the number of possible correspondences.
arXiv Detail & Related papers (2020-10-08T15:57:06Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.