GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
- URL: http://arxiv.org/abs/2502.04293v1
- Date: Thu, 06 Feb 2025 18:35:13 GMT
- Title: GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
- Authors: Weihang Li, Hongli Xu, Junwen Huang, Hyunjun Jung, Peter KT Yu, Nassir Navab, Benjamin Busam,
- Abstract summary: A key challenge in model-free category-level pose estimation is the extraction of contextual object features that generalize across varying instances within a specific category.<n>We present GCE-Pose, a method that enhances pose estimation for novel instances by integrating category-level global context prior.
- Score: 52.910282443646864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge in model-free category-level pose estimation is the extraction of contextual object features that generalize across varying instances within a specific category. Recent approaches leverage foundational features to capture semantic and geometry cues from data. However, these approaches fail under partial visibility. We overcome this with a first-complete-then-aggregate strategy for feature extraction utilizing class priors. In this paper, we present GCE-Pose, a method that enhances pose estimation for novel instances by integrating category-level global context prior. GCE-Pose performs semantic shape reconstruction with a proposed Semantic Shape Reconstruction (SSR) module. Given an unseen partial RGB-D object instance, our SSR module reconstructs the instance's global geometry and semantics by deforming category-specific 3D semantic prototypes through a learned deep Linear Shape Model. We further introduce a Global Context Enhanced (GCE) feature fusion module that effectively fuses features from partial RGB-D observations and the reconstructed global context. Extensive experiments validate the impact of our global context prior and the effectiveness of the GCE fusion module, demonstrating that GCE-Pose significantly outperforms existing methods on challenging real-world datasets HouseCat6D and NOCS-REAL275. Our project page is available at https://colin-de.github.io/GCE-Pose/.
Related papers
- Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation [19.117822086210513]
INKL-Pose is a novel category-level object pose estimation framework.
It enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation.
Experiments on CAMERA25, REAL275, and HouseCat6D demonstrate that INKL-Pose achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-04-21T14:37:37Z) - Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image [52.11275397911693]
We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image.
We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts.
Our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.
arXiv Detail & Related papers (2025-04-04T05:08:04Z) - GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation [61.46277064819665]
We introduce GIVEPose, a framework that implements Gradual Intra-class Variation Elimination for category-level object pose estimation.
GIVEPose significantly outperforms existing state-of-the-art RGB-based approaches.
arXiv Detail & Related papers (2025-03-19T11:07:01Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via
Deformable Template Field [29.42222066097076]
Estimating 6D poses and reconstructing 3D shapes of objects in open-world scenes from RGB-depth image pairs is challenging.
We propose the DTF-Net, a novel framework for pose estimation and shape reconstruction based on implicit neural fields of object categories.
arXiv Detail & Related papers (2023-08-04T10:35:40Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Generative Category-Level Shape and Pose Estimation with Semantic
Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image.
To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space.
We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - Monocular Human Pose and Shape Reconstruction using Part Differentiable
Rendering [53.16864661460889]
Recent works succeed in regression-based methods which estimate parametric models directly through a deep neural network supervised by 3D ground truth.
In this paper, we introduce body segmentation as critical supervision.
To improve the reconstruction with part segmentation, we propose a part-level differentiable part that enables part-based models to be supervised by part segmentation.
arXiv Detail & Related papers (2020-03-24T14:25:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.