6DoF Object Pose Estimation via Differentiable Proxy Voting Loss
- URL: http://arxiv.org/abs/2002.03923v2
- Date: Mon, 4 May 2020 22:24:55 GMT
- Title: 6DoF Object Pose Estimation via Differentiable Proxy Voting Loss
- Authors: Xin Yu and Zheyu Zhuang and Piotr Koniusz and Hongdong Li
- Abstract summary: We develop a differentiable proxy voting loss (DPVL) which mimics the hypothesis selection in the voting procedure.
Experiments on widely used datasets, i.e., LINEMOD and Occlusion LINEMOD, manifest that our DPVL improves pose estimation performance significantly.
- Score: 113.72905482334767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating a 6DOF object pose from a single image is very challenging due to
occlusions or textureless appearances. Vector-field based keypoint voting has
demonstrated its effectiveness and superiority on tackling those issues.
However, direct regression of vector-fields neglects that the distances between
pixels and keypoints also affect the deviations of hypotheses dramatically. In
other words, small errors in direction vectors may generate severely deviated
hypotheses when pixels are far away from a keypoint. In this paper, we aim to
reduce such errors by incorporating the distances between pixels and keypoints
into our objective. To this end, we develop a simple yet effective
differentiable proxy voting loss (DPVL) which mimics the hypothesis selection
in the voting procedure. By exploiting our voting loss, we are able to train
our network in an end-to-end manner. Experiments on widely used datasets, i.e.,
LINEMOD and Occlusion LINEMOD, manifest that our DPVL improves pose estimation
performance significantly and speeds up the training convergence.
Related papers
- Reducing Semantic Ambiguity In Domain Adaptive Semantic Segmentation Via Probabilistic Prototypical Pixel Contrast [7.092718945468069]
Domain adaptation aims to reduce the model degradation on the target domain caused by the domain shift between the source and target domains.
Probabilistic proto-typical pixel contrast (PPPC) is a universal adaptation framework that models each pixel embedding as a probability.
PPPC not only helps to address ambiguity at the pixel level, yielding discriminative representations but also significant improvements in both synthetic-to-real and day-to-night adaptation tasks.
arXiv Detail & Related papers (2024-09-27T08:25:03Z) - Equipping Diffusion Models with Differentiable Spatial Entropy for Low-Light Image Enhancement [7.302792947244082]
In this work, we propose a novel method that shifts the focus from a deterministic pixel-by-pixel comparison to a statistical perspective.
The core idea is to introduce spatial entropy into the loss function to measure the distribution difference between predictions and targets.
Specifically, we equip the entropy with diffusion models and aim for superior accuracy and enhanced perceptual quality over l1 based noise matching loss.
arXiv Detail & Related papers (2024-04-15T12:35:10Z) - DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - Adaptive Face Recognition Using Adversarial Information Network [57.29464116557734]
Face recognition models often degenerate when training data are different from testing data.
We propose a novel adversarial information network (AIN) to address it.
arXiv Detail & Related papers (2023-05-23T02:14:11Z) - Multi-View Keypoints for Reliable 6D Object Pose Estimation [12.436320203635143]
We propose a novel multi-view approach to combine heatmap and keypoint estimates into a probability density map over 3D space.
We demonstrate an average pose estimation error of approximately 0.5mm and 2 degrees across a variety of difficult low-feature and reflective objects.
arXiv Detail & Related papers (2023-03-29T16:28:11Z) - Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation [64.12149365530624]
Most modern image-based 6D object pose estimation methods learn to predict 2D-3D correspondences, from which the pose can be obtained using a solver.
Here, we argue that this conflicts with the averaging nature of the problem leading to gradients that may encourage the network to degrade accuracy.
arXiv Detail & Related papers (2023-03-21T00:32:31Z) - ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor
Extraction [21.994171434960734]
We present a differentiable keypoint detection module, which outputs accurate sub-pixel keypoints.
The reprojection loss is then proposed to directly optimize these sub-pixel keypoints, and the dispersity peak loss is presented for accurate keypoints regularization.
A lightweight network is designed for keypoint detection and descriptor extraction, which can run at 95 frames per second for 640x480 images on a commercial GPU.
arXiv Detail & Related papers (2021-12-06T10:10:30Z) - Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving.
In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z) - REDE: End-to-end Object 6D Pose Robust Estimation Using Differentiable
Outliers Elimination [15.736699709454857]
We propose REDE, a novel end-to-end object pose estimator using RGB-D data.
We also propose a differentiable outliers elimination method that regresses the candidate result and the confidence simultaneously.
The experimental results on three benchmark datasets show that REDE slightly outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2020-10-24T06:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.