Related papers: Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation

Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation

URL: http://arxiv.org/abs/2503.13926v2
Date: Wed, 19 Mar 2025 11:29:13 GMT
Title: Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation
Authors: Huan Ren, Wenfei Yang, Xiang Liu, Shifeng Zhang, Tianzhu Zhang,
Abstract summary: Category-level object pose estimation aims to determine the pose and size of novel objects in specific categories.<n>Existing correspondence-based approaches typically adopt point-based representations to establish the correspondences between primitive observed points and normalized object coordinates.<n>We introduce a novel architecture called SpherePose, which yields precise correspondence prediction through three core designs.
Score: 42.48001557547222
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Category-level object pose estimation aims to determine the pose and size of novel objects in specific categories. Existing correspondence-based approaches typically adopt point-based representations to establish the correspondences between primitive observed points and normalized object coordinates. However, due to the inherent shape-dependence of canonical coordinates, these methods suffer from semantic incoherence across diverse object shapes. To resolve this issue, we innovatively leverage the sphere as a shared proxy shape of objects to learn shape-independent transformation via spherical representations. Based on this insight, we introduce a novel architecture called SpherePose, which yields precise correspondence prediction through three core designs. Firstly, We endow the point-wise feature extraction with SO(3)-invariance, which facilitates robust mapping between camera coordinate space and object coordinate space regardless of rotation transformation. Secondly, the spherical attention mechanism is designed to propagate and integrate features among spherical anchors from a comprehensive perspective, thus mitigating the interference of noise and incomplete point cloud. Lastly, a hyperbolic correspondence loss function is designed to distinguish subtle distinctions, which can promote the precision of correspondence prediction. Experimental results on CAMERA25, REAL275 and HouseCat6D benchmarks demonstrate the superior performance of our method, verifying the effectiveness of spherical representations and architectural innovations.

Related papers

Structure-Aware Correspondence Learning for Relative Pose Estimation [65.44234975976451]
Relative pose estimation provides a promising way for achieving object-agnostic pose estimation. Existing 3D correspondence-based methods suffer from small overlaps in visible regions and unreliable feature estimation for invisible regions. We propose a novel Structure-Aware Correspondence Learning method for Relative Pose Estimation, which consists of two key modules.
arXiv Detail & Related papers (2025-03-24T13:43:44Z)
Interior Object Geometry via Fitted Frames [18.564031163436553]
We describe a representation targeted for anatomic objects which is designed to enable strong locational correspondence within object populations. The method generates fitted frames on the boundary and in the interior of objects and produces alignment-free geometric features from them.
arXiv Detail & Related papers (2024-07-19T14:38:47Z)
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation [79.12683101131368]
Category-level object pose estimation, aiming to predict the 6D pose and 3D size of objects from known categories, typically struggles with large intra-class shape variation. We present SecondPose, a novel approach integrating object-specific geometric features with semantic category priors from DINOv2.
arXiv Detail & Related papers (2023-11-18T17:14:07Z)
Loop Closure Detection Based on Object-level Spatial Layout and Semantic Consistency [14.694754836704819]
We present an object-based loop closure detection method based on the spatial layout and semanic consistency of the 3D scene graph. Experimental results demonstrate that our proposed data association approach can construct more accurate 3D semantic maps.
arXiv Detail & Related papers (2023-04-11T11:20:51Z)
Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE(3) Equivariance [33.10167928198986]
Category-level articulated object pose estimation aims to estimate a hierarchy of articulation-aware object poses of an unseen articulated object from a known category. We present a novel self-supervised strategy that solves this problem without any human labels.
arXiv Detail & Related papers (2023-02-28T03:02:11Z)
Generative Category-Level Shape and Pose Estimation with Semantic Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image. To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space. We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z)
RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation [103.74918834553247]
Category-level object pose estimation aims to predict the 6D pose as well as the 3D metric size of arbitrary objects from a known set of categories. Recent methods harness shape prior adaptation to map the observed point cloud into the canonical space and apply Umeyama algorithm to recover the pose and size. We propose a novel geometry-guided Residual Object Bounding Box Projection network RBP-Pose that jointly predicts object pose and residual vectors.
arXiv Detail & Related papers (2022-07-30T14:45:20Z)
NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go [109.88509362837475]
We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes. NeuroMorph produces smooth and point-to-point correspondences between them. It works well for a large variety of input shapes, including non-isometric pairs from different object categories.
arXiv Detail & Related papers (2021-06-17T12:25:44Z)
3D Object Classification on Partial Point Clouds: A Practical Perspective [91.81377258830703]
A point cloud is a popular shape representation adopted in 3D object classification. This paper introduces a practical setting to classify partial point clouds of object instances under any poses. A novel algorithm in an alignment-classification manner is proposed in this paper.
arXiv Detail & Related papers (2020-12-18T04:00:56Z)
Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation [76.21696417873311]
We introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space. CCNs extract a view-specific feature through a view-specific convolutional kernel to predict object category scores at each viewpoint. Our experiments demonstrate the effectiveness of the cylindrical convolutional networks on joint object detection and viewpoint estimation.
arXiv Detail & Related papers (2020-03-25T10:24:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.