Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation
- URL: http://arxiv.org/abs/2504.15134v2
- Date: Sun, 27 Apr 2025 02:01:27 GMT
- Title: Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation
- Authors: Xiao Zhang, Lu Zou, Tao Lu, Yuan Yao, Zhangjin Huang, Guoping Wang,
- Abstract summary: INKL-Pose is a novel category-level object pose estimation framework.<n>It enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation.<n>Experiments on CAMERA25, REAL275, and HouseCat6D demonstrate that INKL-Pose achieves state-of-the-art performance.
- Score: 19.117822086210513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Category-level object pose estimation aims to predict the 6D pose and size of previously unseen instances from predefined categories, requiring strong generalization across diverse object instances. Although many previous methods attempt to mitigate intra-class variations, they often struggle with instances exhibiting complex geometries or significant deviations from canonical shapes. To address this challenge, we propose INKL-Pose, a novel category-level object pose estimation framework that enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation. Specifically, our approach first predicts semantically consistent and geometric informative keypoints through an Instance-Adaptive Keypoint Generator, then refines them with: (1) a Local Keypoint Feature Aggregator capturing fine-grained geometries, and (2) a Global Keypoint Feature Aggregator using bidirectional Mamba for structural consistency. To enable bidirectional modeling in Mamba, we introduce a Feature Sequence Flipping strategy that preserves spatial coherence while constructing backward feature sequences. Additionally, we design a surface loss and a separation loss to enforce uniform coverage and spatial diversity in keypoint distribution. The generated keypoints are finally mapped to a canonical space for regressing the object's 6D pose and size. Extensive experiments on CAMERA25, REAL275, and HouseCat6D demonstrate that INKL-Pose achieves state-of-the-art performance and significantly outperforms existing methods.
Related papers
- Universal Features Guided Zero-Shot Category-Level Object Pose Estimation [52.29006019352873]
We propose a zero-shot method to achieve category-level 6-DOF object pose estimation.<n>Our method exploits both 2D and 3D universal features of input RGB-D image to establish semantic similarity-based correspondences.<n>Our method outperforms previous methods on the REAL275 and Wild6D benchmarks for unseen categories.
arXiv Detail & Related papers (2025-01-06T08:10:13Z) - Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation [38.03793706479096]
Category-level 6D object pose estimation aims to estimate the rotation, translation and size of unseen instances within specific categories.
We propose a novel Instance-Adaptive and Geometric-Aware Keypoint Learning method for category-level 6D object pose estimation (AG-Pose)
The proposed AG-Pose outperforms state-of-the-art methods by a large margin without category-specific shape priors.
arXiv Detail & Related papers (2024-03-28T16:02:03Z) - SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation [79.12683101131368]
Category-level object pose estimation, aiming to predict the 6D pose and 3D size of objects from known categories, typically struggles with large intra-class shape variation.
We present SecondPose, a novel approach integrating object-specific geometric features with semantic category priors from DINOv2.
arXiv Detail & Related papers (2023-11-18T17:14:07Z) - SOCS: Semantically-aware Object Coordinate Space for Category-Level 6D
Object Pose Estimation under Large Shape Variations [12.348551686086255]
Most learning-based approaches to category-level 6D pose estimation are design around normalized object coordinate space (NOCS)
We propose Semantically-aware Object Coordinate Space (SOCS) built by warping-and-aligning the objects guided by a sparse set of keypoints with semantically meaningful correspondence.
arXiv Detail & Related papers (2023-03-18T06:34:16Z) - Generative Category-Level Shape and Pose Estimation with Semantic
Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image.
To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space.
We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z) - Pose for Everything: Towards Category-Agnostic Pose Estimation [93.07415325374761]
Category-Agnostic Pose Estimation (CAPE) aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.
A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images.
We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms.
arXiv Detail & Related papers (2022-07-21T09:40:54Z) - On Hyperbolic Embeddings in 2D Object Detection [76.12912000278322]
We study whether a hyperbolic geometry better matches the underlying structure of the object classification space.
We incorporate a hyperbolic classifier in two-stage, keypoint-based, and transformer-based object detection architectures.
We observe categorical class hierarchies emerging in the structure of the classification space, resulting in lower classification errors and boosting the overall object detection performance.
arXiv Detail & Related papers (2022-03-15T16:43:40Z) - GPV-Pose: Category-level Object Pose Estimation via Geometry-guided
Point-wise Voting [103.74918834553249]
GPV-Pose is a novel framework for robust category-level pose estimation.
It harnesses geometric insights to enhance the learning of category-level pose-sensitive features.
It produces superior results to state-of-the-art competitors on common public benchmarks.
arXiv Detail & Related papers (2022-03-15T13:58:50Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z) - Category-Level Articulated Object Pose Estimation [34.57672805595464]
We introduce Articulation-aware Normalized Coordinate Space Hierarchy (ANCSH)
ANCSH is a canonical representation for different articulated objects in a given category.
We develop a deep network based on PointNet++ that predicts ANCSH from a single depth point cloud.
arXiv Detail & Related papers (2019-12-26T18:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.