Related papers: SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB image

SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB image

URL: http://arxiv.org/abs/2210.01044v1
Date: Mon, 3 Oct 2022 16:02:10 GMT
Title: SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB image
Authors: Florian Langer, Gwangbin Bae, Ignas Budvytis, Roberto Cipolla
Abstract summary: Estimating 3D shapes and poses of static objects from a single image has important applications for robotics, augmented reality and digital content creation. We demonstrate that a sparse, iterative, render-and-compare approach is more accurate and robust than relying on normalised object coordinates. Our alignment procedure converges after just 3 iterations, improving the state-of-the-art performance on the challenging real-world dataset ScanNet.
Score: 21.77811443143683
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Estimating 3D shapes and poses of static objects from a single image has important applications for robotics, augmented reality and digital content creation. Often this is done through direct mesh predictions which produces unrealistic, overly tessellated shapes or by formulating shape prediction as a retrieval task followed by CAD model alignment. Directly predicting CAD model poses from 2D image features is difficult and inaccurate. Some works, such as ROCA, regress normalised object coordinates and use those for computing poses. While this can produce more accurate pose estimates, predicting normalised object coordinates is susceptible to systematic failure. Leveraging efficient transformer architectures we demonstrate that a sparse, iterative, render-and-compare approach is more accurate and robust than relying on normalised object coordinates. For this we combine 2D image information including sparse depth and surface normal values which we estimate directly from the image with 3D CAD model information in early fusion. In particular, we reproject points sampled from the CAD model in an initial, random pose and compute their depth and surface normal values. This combined information is the input to a pose prediction network, SPARC-Net which we train to predict a 9 DoF CAD model pose update. The CAD model is reprojected again and the next pose update is predicted. Our alignment procedure converges after just 3 iterations, improving the state-of-the-art performance on the challenging real-world dataset ScanNet from 25.0% to 31.8% instance alignment accuracy. Code will be released at https://github.com/florianlanger/SPARC .

Related papers

CameraHMR: Aligning People with Perspective [54.05758012879385]
We address the challenge of accurate 3D human pose and shape estimation from monocular images. Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations. We make two contributions that improve pGT accuracy.
arXiv Detail & Related papers (2024-11-12T19:12:12Z)
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images. Our model achieves real-time 3D Gaussian reconstruction during inference. This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z)
Personalized 3D Human Pose and Shape Refinement [19.082329060985455]
regression-based methods have dominated the field of 3D human pose and shape estimation. We propose to construct dense correspondences between initial human model estimates and the corresponding images. We show that our approach not only consistently leads to better image-model alignment, but also to improved 3D accuracy.
arXiv Detail & Related papers (2024-03-18T10:13:53Z)
DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image [34.47379913018661]
We propose DiffCAD, the first weakly-supervised probabilistic approach to CAD retrieval and alignment from an RGB image. We formulate this as a conditional generative task, leveraging diffusion to learn implicit probabilistic models capturing the shape, pose, and scale of CAD objects in an image. Our approach is trained only on synthetic data, leveraging monocular depth and mask estimates to enable robust zero-shot adaptation to various real target domains.
arXiv Detail & Related papers (2023-11-30T15:10:21Z)
GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence [64.77224422330737]
GigaPose is a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three. It achieves state-of-the-art accuracy and can be seamlessly integrated with existing refinement methods.
arXiv Detail & Related papers (2023-11-23T18:55:03Z)
Sparse Multi-Object Render-and-Compare [33.97243145891282]
Reconstructing 3D shape and pose of static objects from a single image is an essential task for various industries. Directly predicting 3D shapes produces unrealistic, overly smoothed or tessellated shapes. Retrieving CAD models ensures realistic shapes but requires robust and accurate alignment.
arXiv Detail & Related papers (2023-10-17T12:01:32Z)
OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models [51.68715543630427]
OnePose relies on detecting repeatable image keypoints and is thus prone to failure on low-textured objects. We propose a keypoint-free pose estimation pipeline to remove the need for repeatable keypoint detection. A 2D-3D matching network directly establishes 2D-3D correspondences between the query image and the reconstructed point-cloud model.
arXiv Detail & Related papers (2023-01-18T17:47:13Z)
Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses. We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network. Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z)
Leveraging Geometry for Shape Estimation from a Single RGB Image [25.003116148843525]
We show how keypoint matches from an RGB image to a rendered CAD model allow for more precise object pose predictions. We also show that keypoint matches can not only be used to estimate the pose of an object, but also to modify the shape of the object itself. The proposed geometric shape prediction improves the AP mesh over the state-of-the-art from 33.2 to 37.8 on seen objects and from 8.2 to 17.1 on unseen objects.
arXiv Detail & Related papers (2021-11-10T10:17:56Z)
Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image [58.953160501596805]
We propose a novel approach towards constructing a joint embedding space between 2D images and 3D CAD models in a patch-wise fashion. Our approach is more robust than state of the art in real-world scenarios without any exact CAD matches.
arXiv Detail & Related papers (2021-08-20T20:58:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.