Related papers: RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery

RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery

URL: http://arxiv.org/abs/2309.10255v2
Date: Wed, 18 Oct 2023 08:21:34 GMT
Title: RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery
Authors: Jiaxin Wei, Xibin Song, Weizhe Liu, Laurent Kneip, Hongdong Li and Pan Ji
Abstract summary: We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information. A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
Score: 72.13154206106259
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While showing promising results, recent RGB-D camera-based category-level object pose estimation methods have restricted applications due to the heavy reliance on depth sensors. RGB-only methods provide an alternative to this problem yet suffer from inherent scale ambiguity stemming from monocular observations. In this paper, we propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information, mainly facilitating the search for inlier 2D-3D correspondence. Meanwhile, a separate branch is designed to directly recover the metric scale of the object based on category-level statistics. Finally, we advocate using the RANSAC-P$n$P algorithm to robustly solve for 6D object pose. Extensive experiments have been conducted on both synthetic and real datasets, demonstrating the superior performance of our method over previous state-of-the-art RGB-based approaches, especially in terms of rotation accuracy. Code: https://github.com/goldoak/DMSR.

Related papers

Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes [5.224479258519442]
We introduce a unified model that integrates detection and pose estimation into a single framework for RGB images.<n>Our approach achieves state-of-the-art results for RGB category-level pose estimation on REAL275.
arXiv Detail & Related papers (2025-08-04T07:57:39Z)
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image [86.75098349480014]
This paper tackles category-level pose estimation of articulated objects in robotic manipulation tasks. We propose a single-stage Network, CAP-Net, for estimating the 6D poses and sizes of Categorical Articulated Parts. We introduce the RGBD-Art dataset, the largest RGB-D articulated dataset to date, featuring RGB images and depth noise simulated from real sensors.
arXiv Detail & Related papers (2025-04-15T14:30:26Z)
SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network. It can perform inference at 32 FPS without requiring inputs other than the RGB image. It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z)
LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation [43.549593231397644]
LaPose is a novel framework that models the object shape as the Laplacian mixture model for Pose estimation. By representing each point as a probabilistic distribution, we explicitly quantify the shape uncertainty. LaPose yields state-of-the-art performance in category-level object pose estimation.
arXiv Detail & Related papers (2024-09-24T04:20:18Z)
Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair. We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z)
RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images [13.051302134031808]
We introduce a novel method for calculating the 6DoF pose of an object using a single RGB-D image. Unlike existing methods that either directly predict objects' poses or rely on sparse keypoints for pose recovery, our approach addresses this challenging task using dense correspondence.
arXiv Detail & Related papers (2024-05-14T10:10:45Z)
DVMNet++: Rethinking Relative Pose Estimation for Unseen Objects [59.51874686414509]
Existing approaches typically predict 3D translation utilizing the ground-truth object bounding box and approximate 3D rotation with a large number of discrete hypotheses. We present a Deep Voxel Matching Network (DVMNet++) that computes the relative object pose in a single pass. Our approach delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z)
MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams. Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph. Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z)
ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation [76.31125154523056]
We present a discrete descriptor, which can represent the object surface densely. We also propose a coarse to fine training strategy, which enables fine-grained correspondence prediction.
arXiv Detail & Related papers (2022-03-17T16:16:24Z)
Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image [27.234658117816103]
We propose a single-stage, keypoint-based approach for category-level object pose estimation. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions. We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric.
arXiv Detail & Related papers (2021-09-13T17:55:00Z)
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism [49.89268018642999]
We propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. The proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation.
arXiv Detail & Related papers (2021-03-12T03:07:24Z)
Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images. Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object. We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z)
Robust 6D Object Pose Estimation by Learning RGB-D Features [59.580366107770764]
We propose a novel discrete-continuous formulation for rotation regression to resolve this local-optimum problem. We uniformly sample rotation anchors in SO(3), and predict a constrained deviation from each anchor to the target, as well as uncertainty scores for selecting the best prediction. Experiments on two benchmarks: LINEMOD and YCB-Video, show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2020-02-29T06:24:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.