Related papers: RCGNet: RGB-based Category-Level 6D Object Pose Estimation with Geometric Guidance

RCGNet: RGB-based Category-Level 6D Object Pose Estimation with Geometric Guidance

URL: http://arxiv.org/abs/2508.13623v1
Date: Tue, 19 Aug 2025 08:34:17 GMT
Title: RCGNet: RGB-based Category-Level 6D Object Pose Estimation with Geometric Guidance
Authors: Sheng Yu, Di-Hua Zhai, Yuanqing Xia,
Abstract summary: We propose a novel category-level object pose estimation approach that relies solely on RGB images.<n>This method enables accurate pose estimation in real-world scenarios without the need for depth data.
Score: 11.283796920965745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While most current RGB-D-based category-level object pose estimation methods achieve strong performance, they face significant challenges in scenes lacking depth information. In this paper, we propose a novel category-level object pose estimation approach that relies solely on RGB images. This method enables accurate pose estimation in real-world scenarios without the need for depth data. Specifically, we design a transformer-based neural network for category-level object pose estimation, where the transformer is employed to predict and fuse the geometric features of the target object. To ensure that these predicted geometric features faithfully capture the object's geometry, we introduce a geometric feature-guided algorithm, which enhances the network's ability to effectively represent the object's geometric information. Finally, we utilize the RANSAC-PnP algorithm to compute the object's pose, addressing the challenges associated with variable object scales in pose estimation. Experimental results on benchmark datasets demonstrate that our approach is not only highly efficient but also achieves superior accuracy compared to previous RGB-based methods. These promising results offer a new perspective for advancing category-level object pose estimation using RGB images.

Related papers

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image [86.75098349480014]
This paper tackles category-level pose estimation of articulated objects in robotic manipulation tasks.<n>We propose a single-stage Network, CAP-Net, for estimating the 6D poses and sizes of Categorical Articulated Parts.<n>We introduce the RGBD-Art dataset, the largest RGB-D articulated dataset to date, featuring RGB images and depth noise simulated from real sensors.
arXiv Detail & Related papers (2025-04-15T14:30:26Z)
GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation [61.46277064819665]
We introduce GIVEPose, a framework that implements Gradual Intra-class Variation Elimination for category-level object pose estimation.<n>GIVEPose significantly outperforms existing state-of-the-art RGB-based approaches.
arXiv Detail & Related papers (2025-03-19T11:07:01Z)
Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames [10.859307261818362]
Estimating the 6D pose of textureless objects from RBG images is an important problem in robotics.<n>We propose a comprehensive active perception framework for estimating the 6D poses of textureless objects using only RGB images.
arXiv Detail & Related papers (2025-03-05T18:28:32Z)
RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information. A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z)
MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams. Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph. Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z)
Generative Category-Level Shape and Pose Estimation with Semantic Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image. To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space. We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z)
Category-Agnostic 6D Pose Estimation with Conditional Neural Processes [19.387280883044482]
We present a novel meta-learning approach for 6D pose estimation on unknown objects. Our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories.
arXiv Detail & Related papers (2022-06-14T20:46:09Z)
GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting [103.74918834553249]
GPV-Pose is a novel framework for robust category-level pose estimation. It harnesses geometric insights to enhance the learning of category-level pose-sensitive features. It produces superior results to state-of-the-art competitors on common public benchmarks.
arXiv Detail & Related papers (2022-03-15T13:58:50Z)
DONet: Learning Category-Level 6D Object Pose and Size Estimation from Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image. Our framework makes inferences based on the rich geometric information of the object in the depth channel alone. Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z)
Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images. Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object. We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.