Multi-View Object Pose Refinement With Differentiable Renderer
- URL: http://arxiv.org/abs/2207.02811v1
- Date: Wed, 6 Jul 2022 17:02:22 GMT
- Title: Multi-View Object Pose Refinement With Differentiable Renderer
- Authors: Ivan Shugurov, Ivan Pavlov, Sergey Zakharov, Slobodan Ilic
- Abstract summary: This paper introduces a novel multi-view 6 DoF object pose refinement approach focusing on improving methods trained on synthetic data.
It is based on the DPOD detector, which produces dense 2D-3D correspondences between the model vertices and the image pixels in each frame.
We report excellent performance in comparison to the state-of-the-art methods trained on the synthetic and real data.
- Score: 22.040014384283378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a novel multi-view 6 DoF object pose refinement
approach focusing on improving methods trained on synthetic data. It is based
on the DPOD detector, which produces dense 2D-3D correspondences between the
model vertices and the image pixels in each frame. We have opted for the use of
multiple frames with known relative camera transformations, as it allows
introduction of geometrical constraints via an interpretable ICP-like loss
function. The loss function is implemented with a differentiable renderer and
is optimized iteratively. We also demonstrate that a full detection and
refinement pipeline, which is trained solely on synthetic data, can be used for
auto-labeling real data. We perform quantitative evaluation on LineMOD,
Occlusion, Homebrewed and YCB-V datasets and report excellent performance in
comparison to the state-of-the-art methods trained on the synthetic and real
data. We demonstrate empirically that our approach requires only a few frames
and is robust to close camera locations and noise in extrinsic camera
calibration, making its practical usage easier and more ubiquitous.
Related papers
- FaVoR: Features via Voxel Rendering for Camera Relocalization [23.7893950095252]
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image.
We propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features.
By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking.
arXiv Detail & Related papers (2024-09-11T18:58:16Z) - Diff-DOPE: Differentiable Deep Object Pose Estimation [29.703385848843414]
We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object.
The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model.
We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation datasets.
arXiv Detail & Related papers (2023-09-30T18:52:57Z) - Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust
Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement.
The problem is formulated as a non-linear least squares problem based on the estimated correspondence field.
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z) - Camera Distortion-aware 3D Human Pose Estimation in Video with
Optimization-based Meta-Learning [23.200130129530653]
Existing 3D human pose estimation algorithms trained on distortion-free datasets suffer performance drop when applied to new scenarios with a specific camera distortion.
We propose a simple yet effective model for 3D human pose estimation in video that can quickly adapt to any distortion environment.
arXiv Detail & Related papers (2021-11-30T01:35:04Z) - Differentiable Rendering with Perturbed Optimizers [85.66675707599782]
Reasoning about 3D scenes from their 2D image projections is one of the core problems in computer vision.
Our work highlights the link between some well-known differentiable formulations and randomly smoothed renderings.
We apply our method to 3D scene reconstruction and demonstrate its advantages on the tasks of 6D pose estimation and 3D mesh reconstruction.
arXiv Detail & Related papers (2021-10-18T08:56:23Z) - Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.