Related papers: Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation

Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation

URL: http://arxiv.org/abs/2504.15134v2
Date: Sun, 27 Apr 2025 02:01:27 GMT
Title: Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation
Authors: Xiao Zhang, Lu Zou, Tao Lu, Yuan Yao, Zhangjin Huang, Guoping Wang,
Abstract summary: INKL-Pose is a novel category-level object pose estimation framework.<n>It enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation.<n>Experiments on CAMERA25, REAL275, and HouseCat6D demonstrate that INKL-Pose achieves state-of-the-art performance.
Score: 19.117822086210513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Category-level object pose estimation aims to predict the 6D pose and size of previously unseen instances from predefined categories, requiring strong generalization across diverse object instances. Although many previous methods attempt to mitigate intra-class variations, they often struggle with instances exhibiting complex geometries or significant deviations from canonical shapes. To address this challenge, we propose INKL-Pose, a novel category-level object pose estimation framework that enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation. Specifically, our approach first predicts semantically consistent and geometric informative keypoints through an Instance-Adaptive Keypoint Generator, then refines them with: (1) a Local Keypoint Feature Aggregator capturing fine-grained geometries, and (2) a Global Keypoint Feature Aggregator using bidirectional Mamba for structural consistency. To enable bidirectional modeling in Mamba, we introduce a Feature Sequence Flipping strategy that preserves spatial coherence while constructing backward feature sequences. Additionally, we design a surface loss and a separation loss to enforce uniform coverage and spatial diversity in keypoint distribution. The generated keypoints are finally mapped to a canonical space for regressing the object's 6D pose and size. Extensive experiments on CAMERA25, REAL275, and HouseCat6D demonstrate that INKL-Pose achieves state-of-the-art performance and significantly outperforms existing methods.

Related papers

Universal Features Guided Zero-Shot Category-Level Object Pose Estimation [52.29006019352873]
We propose a zero-shot method to achieve category-level 6-DOF object pose estimation.<n>Our method exploits both 2D and 3D universal features of input RGB-D image to establish semantic similarity-based correspondences.<n>Our method outperforms previous methods on the REAL275 and Wild6D benchmarks for unseen categories.
arXiv Detail & Related papers (2025-01-06T08:10:13Z)
Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation [38.03793706479096]
Category-level 6D object pose estimation aims to estimate the rotation, translation and size of unseen instances within specific categories. We propose a novel Instance-Adaptive and Geometric-Aware Keypoint Learning method for category-level 6D object pose estimation (AG-Pose) The proposed AG-Pose outperforms state-of-the-art methods by a large margin without category-specific shape priors.
arXiv Detail & Related papers (2024-03-28T16:02:03Z)
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation [79.12683101131368]
Category-level object pose estimation, aiming to predict the 6D pose and 3D size of objects from known categories, typically struggles with large intra-class shape variation. We present SecondPose, a novel approach integrating object-specific geometric features with semantic category priors from DINOv2.
arXiv Detail & Related papers (2023-11-18T17:14:07Z)
SOCS: Semantically-aware Object Coordinate Space for Category-Level 6D Object Pose Estimation under Large Shape Variations [12.348551686086255]
Most learning-based approaches to category-level 6D pose estimation are design around normalized object coordinate space (NOCS) We propose Semantically-aware Object Coordinate Space (SOCS) built by warping-and-aligning the objects guided by a sparse set of keypoints with semantically meaningful correspondence.
arXiv Detail & Related papers (2023-03-18T06:34:16Z)
Generative Category-Level Shape and Pose Estimation with Semantic Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image. To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space. We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z)
Pose for Everything: Towards Category-Agnostic Pose Estimation [93.07415325374761]
Category-Agnostic Pose Estimation (CAPE) aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition. A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images. We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms.
arXiv Detail & Related papers (2022-07-21T09:40:54Z)
On Hyperbolic Embeddings in 2D Object Detection [76.12912000278322]
We study whether a hyperbolic geometry better matches the underlying structure of the object classification space. We incorporate a hyperbolic classifier in two-stage, keypoint-based, and transformer-based object detection architectures. We observe categorical class hierarchies emerging in the structure of the classification space, resulting in lower classification errors and boosting the overall object detection performance.
arXiv Detail & Related papers (2022-03-15T16:43:40Z)
GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting [103.74918834553249]
GPV-Pose is a novel framework for robust category-level pose estimation. It harnesses geometric insights to enhance the learning of category-level pose-sensitive features. It produces superior results to state-of-the-art competitors on common public benchmarks.
arXiv Detail & Related papers (2022-03-15T13:58:50Z)
Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image [27.234658117816103]
We propose a single-stage, keypoint-based approach for category-level object pose estimation. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions. We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric.
arXiv Detail & Related papers (2021-09-13T17:55:00Z)
Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. We present a simple yet effective approach, named disentangled keypoint regression (DEKR) We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z)
Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries. To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions. We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z)
Category-Level Articulated Object Pose Estimation [34.57672805595464]
We introduce Articulation-aware Normalized Coordinate Space Hierarchy (ANCSH) ANCSH is a canonical representation for different articulated objects in a given category. We develop a deep network based on PointNet++ that predicts ANCSH from a single depth point cloud.
arXiv Detail & Related papers (2019-12-26T18:34:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.