Diff-DOPE: Differentiable Deep Object Pose Estimation
- URL: http://arxiv.org/abs/2310.00463v1
- Date: Sat, 30 Sep 2023 18:52:57 GMT
- Title: Diff-DOPE: Differentiable Deep Object Pose Estimation
- Authors: Jonathan Tremblay, Bowen Wen, Valts Blukis, Balakumar Sundaralingam,
Stephen Tyree, Stan Birchfield
- Abstract summary: We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object.
The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model.
We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation datasets.
- Score: 29.703385848843414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a
3D textured model of an object, and an initial pose of the object. The method
uses differentiable rendering to update the object pose to minimize the visual
error between the image and the projection of the model. We show that this
simple, yet effective, idea is able to achieve state-of-the-art results on pose
estimation datasets. Our approach is a departure from recent methods in which
the pose refiner is a deep neural network trained on a large synthetic dataset
to map inputs to refinement steps. Rather, our use of differentiable rendering
allows us to avoid training altogether. Our approach performs multiple gradient
descent optimizations in parallel with different random learning rates to avoid
local minima from symmetric objects, similar appearances, or wrong step size.
Various modalities can be used, e.g., RGB, depth, intensity edges, and object
segmentation masks. We present experiments examining the effect of various
choices, showing that the best results are found when the RGB image is
accompanied by an object mask and depth image to guide the optimization
process.
Related papers
- SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network.
It can perform inference at 32 FPS without requiring inputs other than the RGB image.
It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z) - RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images [13.051302134031808]
We introduce a novel method for calculating the 6DoF pose of an object using a single RGB-D image.
Unlike existing methods that either directly predict objects' poses or rely on sparse keypoints for pose recovery, our approach addresses this challenging task using dense correspondence.
arXiv Detail & Related papers (2024-05-14T10:10:45Z) - DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - TransPoser: Transformer as an Optimizer for Joint Object Shape and Pose
Estimation [25.395619346823715]
We propose a novel method for joint estimation of shape and pose of rigid objects from their sequentially observed RGB-D images.
We introduce Deep Directional Distance Function (DeepDDF), a neural network that directly outputs the depth image of an object given the camera viewpoint and viewing direction.
We formulate the joint estimation itself as a Transformer which we refer to as TransPoser.
arXiv Detail & Related papers (2023-03-23T17:46:54Z) - Lightweight Monocular Depth Estimation [4.19709743271943]
We create a lightweight machine-learning model in order to predict the depth value of each pixel given only a single RGB image as input with the Unet structure of the image segmentation network.
The proposed method achieves relatively high accuracy and low rootmean-square error.
arXiv Detail & Related papers (2022-12-21T21:05:16Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose
Estimation [76.31125154523056]
We present a discrete descriptor, which can represent the object surface densely.
We also propose a coarse to fine training strategy, which enables fine-grained correspondence prediction.
arXiv Detail & Related papers (2022-03-17T16:16:24Z) - Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images
with Virtual Depth [64.29043589521308]
We propose a rendering module to augment the training data by synthesizing images with virtual-depths.
The rendering module takes as input the RGB image and its corresponding sparse depth image, outputs a variety of photo-realistic synthetic images.
Besides, we introduce an auxiliary module to improve the detection model by jointly optimizing it through a depth estimation task.
arXiv Detail & Related papers (2021-07-28T11:00:47Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.