SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB
image
- URL: http://arxiv.org/abs/2210.01044v1
- Date: Mon, 3 Oct 2022 16:02:10 GMT
- Title: SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB
image
- Authors: Florian Langer, Gwangbin Bae, Ignas Budvytis, Roberto Cipolla
- Abstract summary: Estimating 3D shapes and poses of static objects from a single image has important applications for robotics, augmented reality and digital content creation.
We demonstrate that a sparse, iterative, render-and-compare approach is more accurate and robust than relying on normalised object coordinates.
Our alignment procedure converges after just 3 iterations, improving the state-of-the-art performance on the challenging real-world dataset ScanNet.
- Score: 21.77811443143683
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating 3D shapes and poses of static objects from a single image has
important applications for robotics, augmented reality and digital content
creation. Often this is done through direct mesh predictions which produces
unrealistic, overly tessellated shapes or by formulating shape prediction as a
retrieval task followed by CAD model alignment. Directly predicting CAD model
poses from 2D image features is difficult and inaccurate. Some works, such as
ROCA, regress normalised object coordinates and use those for computing poses.
While this can produce more accurate pose estimates, predicting normalised
object coordinates is susceptible to systematic failure. Leveraging efficient
transformer architectures we demonstrate that a sparse, iterative,
render-and-compare approach is more accurate and robust than relying on
normalised object coordinates. For this we combine 2D image information
including sparse depth and surface normal values which we estimate directly
from the image with 3D CAD model information in early fusion. In particular, we
reproject points sampled from the CAD model in an initial, random pose and
compute their depth and surface normal values. This combined information is the
input to a pose prediction network, SPARC-Net which we train to predict a 9 DoF
CAD model pose update. The CAD model is reprojected again and the next pose
update is predicted. Our alignment procedure converges after just 3 iterations,
improving the state-of-the-art performance on the challenging real-world
dataset ScanNet from 25.0% to 31.8% instance alignment accuracy. Code will be
released at https://github.com/florianlanger/SPARC .
Related papers
- CameraHMR: Aligning People with Perspective [54.05758012879385]
We address the challenge of accurate 3D human pose and shape estimation from monocular images.
Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations.
We make two contributions that improve pGT accuracy.
arXiv Detail & Related papers (2024-11-12T19:12:12Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - Personalized 3D Human Pose and Shape Refinement [19.082329060985455]
regression-based methods have dominated the field of 3D human pose and shape estimation.
We propose to construct dense correspondences between initial human model estimates and the corresponding images.
We show that our approach not only consistently leads to better image-model alignment, but also to improved 3D accuracy.
arXiv Detail & Related papers (2024-03-18T10:13:53Z) - DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image [34.47379913018661]
We propose DiffCAD, the first weakly-supervised probabilistic approach to CAD retrieval and alignment from an RGB image.
We formulate this as a conditional generative task, leveraging diffusion to learn implicit probabilistic models capturing the shape, pose, and scale of CAD objects in an image.
Our approach is trained only on synthetic data, leveraging monocular depth and mask estimates to enable robust zero-shot adaptation to various real target domains.
arXiv Detail & Related papers (2023-11-30T15:10:21Z) - GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence [64.77224422330737]
GigaPose is a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images.
Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three.
It achieves state-of-the-art accuracy and can be seamlessly integrated with existing refinement methods.
arXiv Detail & Related papers (2023-11-23T18:55:03Z) - Sparse Multi-Object Render-and-Compare [33.97243145891282]
Reconstructing 3D shape and pose of static objects from a single image is an essential task for various industries.
Directly predicting 3D shapes produces unrealistic, overly smoothed or tessellated shapes.
Retrieving CAD models ensures realistic shapes but requires robust and accurate alignment.
arXiv Detail & Related papers (2023-10-17T12:01:32Z) - OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD
Models [51.68715543630427]
OnePose relies on detecting repeatable image keypoints and is thus prone to failure on low-textured objects.
We propose a keypoint-free pose estimation pipeline to remove the need for repeatable keypoint detection.
A 2D-3D matching network directly establishes 2D-3D correspondences between the query image and the reconstructed point-cloud model.
arXiv Detail & Related papers (2023-01-18T17:47:13Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Leveraging Geometry for Shape Estimation from a Single RGB Image [25.003116148843525]
We show how keypoint matches from an RGB image to a rendered CAD model allow for more precise object pose predictions.
We also show that keypoint matches can not only be used to estimate the pose of an object, but also to modify the shape of the object itself.
The proposed geometric shape prediction improves the AP mesh over the state-of-the-art from 33.2 to 37.8 on seen objects and from 8.2 to 17.1 on unseen objects.
arXiv Detail & Related papers (2021-11-10T10:17:56Z) - Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval
from a Single Image [58.953160501596805]
We propose a novel approach towards constructing a joint embedding space between 2D images and 3D CAD models in a patch-wise fashion.
Our approach is more robust than state of the art in real-world scenarios without any exact CAD matches.
arXiv Detail & Related papers (2021-08-20T20:58:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.