Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation
- URL: http://arxiv.org/abs/2303.11516v2
- Date: Sun, 8 Oct 2023 11:44:50 GMT
- Title: Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation
- Authors: Fulin Liu, Yinlin Hu, Mathieu Salzmann
- Abstract summary: Most modern image-based 6D object pose estimation methods learn to predict 2D-3D correspondences, from which the pose can be obtained using a solver.
Here, we argue that this conflicts with the averaging nature of the problem leading to gradients that may encourage the network to degrade accuracy.
- Score: 64.12149365530624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most modern image-based 6D object pose estimation methods learn to predict
2D-3D correspondences, from which the pose can be obtained using a PnP solver.
Because of the non-differentiable nature of common PnP solvers, these methods
are supervised via the individual correspondences. To address this, several
methods have designed differentiable PnP strategies, thus imposing supervision
on the pose obtained after the PnP step. Here, we argue that this conflicts
with the averaging nature of the PnP problem, leading to gradients that may
encourage the network to degrade the accuracy of individual correspondences. To
address this, we derive a loss function that exploits the ground truth pose
before solving the PnP problem. Specifically, we linearize the PnP solver
around the ground-truth pose and compute the covariance of the resulting pose
distribution. We then define our loss based on the diagonal covariance
elements, which entails considering the final pose estimate yet not suffering
from the PnP averaging issue. Our experiments show that our loss consistently
improves the pose estimation accuracy for both dense and sparse correspondence
based methods, achieving state-of-the-art results on both Linemod-Occluded and
YCB-Video.
Related papers
- FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation [30.710296843150832]
Estimating relative camera poses between images has been a central problem in computer vision.
We show how to combine the best of both methods; our approach yields results that are both precise and robust.
A comprehensive analysis supports our design choices and demonstrates that our method adapts flexibly to various feature extractors and correspondence estimators.
arXiv Detail & Related papers (2024-03-05T18:59:51Z) - EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for
Monocular Object Pose Estimation [30.212903535850874]
Locating 3D objects from a single RGB image via Perspective-n-Point is a long-standing problem in computer vision.
EPro-Scene can enhance existing correspondence networks, closing the gap between MOD-based method and the Line 6DoF pose estimation benchmark.
arXiv Detail & Related papers (2023-03-22T17:57:36Z) - Direct Dense Pose Estimation [138.56533828316833]
Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies.
Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding box for each person.
We propose a novel alternative method for solving the dense pose estimation problem, called Direct Dense Pose (DDP)
arXiv Detail & Related papers (2022-04-04T06:14:38Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for
Monocular Object Pose Estimation [22.672080094222082]
Locating 3D objects from a single RGB image via Perspective-n-Points is a long-standing problem in computer vision.
Recent studies suggest.
a differentiable layer so that 2D-3D point.
correspondences can be partly learned by backagating the object pose.
Yet the entire set of 2D-3D points from scratch fails to converge with existing approaches.
arXiv Detail & Related papers (2022-03-24T17:59:49Z) - RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust
Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement.
The problem is formulated as a non-linear least squares problem based on the estimated correspondence field.
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z) - Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints.
Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z) - Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - PnP-Net: A hybrid Perspective-n-Point Network [2.66512000865131]
We consider the robust Perspective-n-Point problem using a hybrid approach that combines deep learning with model based algorithms.
We demonstrate both synthetic parameters and real world data with low computational requirements.
arXiv Detail & Related papers (2020-03-10T10:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.