Related papers: 6DoF Object Pose Estimation via Differentiable Proxy Voting Loss

6DoF Object Pose Estimation via Differentiable Proxy Voting Loss

URL: http://arxiv.org/abs/2002.03923v2
Date: Mon, 4 May 2020 22:24:55 GMT
Title: 6DoF Object Pose Estimation via Differentiable Proxy Voting Loss
Authors: Xin Yu and Zheyu Zhuang and Piotr Koniusz and Hongdong Li
Abstract summary: We develop a differentiable proxy voting loss (DPVL) which mimics the hypothesis selection in the voting procedure. Experiments on widely used datasets, i.e., LINEMOD and Occlusion LINEMOD, manifest that our DPVL improves pose estimation performance significantly.
Score: 113.72905482334767
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating a 6DOF object pose from a single image is very challenging due to occlusions or textureless appearances. Vector-field based keypoint voting has demonstrated its effectiveness and superiority on tackling those issues. However, direct regression of vector-fields neglects that the distances between pixels and keypoints also affect the deviations of hypotheses dramatically. In other words, small errors in direction vectors may generate severely deviated hypotheses when pixels are far away from a keypoint. In this paper, we aim to reduce such errors by incorporating the distances between pixels and keypoints into our objective. To this end, we develop a simple yet effective differentiable proxy voting loss (DPVL) which mimics the hypothesis selection in the voting procedure. By exploiting our voting loss, we are able to train our network in an end-to-end manner. Experiments on widely used datasets, i.e., LINEMOD and Occlusion LINEMOD, manifest that our DPVL improves pose estimation performance significantly and speeds up the training convergence.

Related papers

PointVDP: Learning View-Dependent Projection by Fireworks Rays for 3D Point Cloud Segmentation [66.00721801098574]
We propose view-dependent projection (VDP) to facilitate point cloud segmentation.<n>VDP generates data-driven projections from 3D point distributions.<n>We construct color regularization to optimize the framework.
arXiv Detail & Related papers (2025-07-09T07:44:00Z)
SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network. It can perform inference at 32 FPS without requiring inputs other than the RGB image. It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z)
Equipping Diffusion Models with Differentiable Spatial Entropy for Low-Light Image Enhancement [7.302792947244082]
In this work, we propose a novel method that shifts the focus from a deterministic pixel-by-pixel comparison to a statistical perspective. The core idea is to introduce spatial entropy into the loss function to measure the distribution difference between predictions and targets. Specifically, we equip the entropy with diffusion models and aim for superior accuracy and enhanced perceptual quality over l1 based noise matching loss.
arXiv Detail & Related papers (2024-04-15T12:35:10Z)
DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses. We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass. Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z)
Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs) Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV. Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z)
Adaptive Face Recognition Using Adversarial Information Network [57.29464116557734]
Face recognition models often degenerate when training data are different from testing data. We propose a novel adversarial information network (AIN) to address it.
arXiv Detail & Related papers (2023-05-23T02:14:11Z)
Multi-View Keypoints for Reliable 6D Object Pose Estimation [12.436320203635143]
We propose a novel multi-view approach to combine heatmap and keypoint estimates into a probability density map over 3D space. We demonstrate an average pose estimation error of approximately 0.5mm and 2 degrees across a variety of difficult low-feature and reflective objects.
arXiv Detail & Related papers (2023-03-29T16:28:11Z)
Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation [64.12149365530624]
Most modern image-based 6D object pose estimation methods learn to predict 2D-3D correspondences, from which the pose can be obtained using a solver. Here, we argue that this conflicts with the averaging nature of the problem leading to gradients that may encourage the network to degrade accuracy.
arXiv Detail & Related papers (2023-03-21T00:32:31Z)
Direct Dense Pose Estimation [138.56533828316833]
Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies. Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding box for each person. We propose a novel alternative method for solving the dense pose estimation problem, called Direct Dense Pose (DDP)
arXiv Detail & Related papers (2022-04-04T06:14:38Z)
ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction [21.994171434960734]
We present a differentiable keypoint detection module, which outputs accurate sub-pixel keypoints. The reprojection loss is then proposed to directly optimize these sub-pixel keypoints, and the dispersity peak loss is presented for accurate keypoints regularization. A lightweight network is designed for keypoint detection and descriptor extraction, which can run at 95 frames per second for 640x480 images on a commercial GPU.
arXiv Detail & Related papers (2021-12-06T10:10:30Z)
Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving. In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z)
REDE: End-to-end Object 6D Pose Robust Estimation Using Differentiable Outliers Elimination [15.736699709454857]
We propose REDE, a novel end-to-end object pose estimator using RGB-D data. We also propose a differentiable outliers elimination method that regresses the candidate result and the confidence simultaneously. The experimental results on three benchmark datasets show that REDE slightly outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2020-10-24T06:45:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.