Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object
Pose Estimation
- URL: http://arxiv.org/abs/2111.00190v1
- Date: Sat, 30 Oct 2021 06:46:44 GMT
- Title: Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object
Pose Estimation
- Authors: Xiaolong Li, Yijia Weng, Li Yi, Leonidas Guibas, A. Lynn Abbott,
Shuran Song, He Wang
- Abstract summary: Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models.
We propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.
- Score: 30.04752448942084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Category-level object pose estimation aims to find 6D object poses of
previously unseen object instances from known categories without access to
object CAD models. To reduce the huge amount of pose annotations needed for
category-level learning, we propose for the first time a self-supervised
learning framework to estimate category-level 6D object pose from single 3D
point clouds.During training, our method assumes no ground-truth pose
annotations, no CAD models, and no multi-view supervision. The key to our
method is to disentangle shape and pose through an invariant shape
reconstruction module and an equivariant pose estimation module, empowered by
SE(3) equivariant point cloud networks.The invariant shape reconstruction
module learns to perform aligned reconstructions, yielding a category-level
reference frame without using any annotations. In addition, the equivariant
pose estimation module achieves category-level pose estimation accuracy that is
comparable to some fully supervised methods. Extensive experiments demonstrate
the effectiveness of our approach on both complete and partial depth point
clouds from the ModelNet40 benchmark, and on real depth point clouds from the
NOCS-REAL 275 dataset. The project page with code and visualizations can be
found at: https://dragonlong.github.io/equi-pose.
Related papers
- Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation [26.982199143972835]
We introduce a diffusion-driven self-supervised network for multi-object shape reconstruction and categorical pose estimation.
Our method significantly outperforms state-of-the-art self-supervised category-level baselines and even surpasses some fully-supervised instance-level and category-level methods.
arXiv Detail & Related papers (2024-03-19T13:43:27Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD
Models [51.68715543630427]
OnePose relies on detecting repeatable image keypoints and is thus prone to failure on low-textured objects.
We propose a keypoint-free pose estimation pipeline to remove the need for repeatable keypoint detection.
A 2D-3D matching network directly establishes 2D-3D correspondences between the query image and the reconstructed point-cloud model.
arXiv Detail & Related papers (2023-01-18T17:47:13Z) - Generative Category-Level Shape and Pose Estimation with Semantic
Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image.
To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space.
We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z) - OnePose: One-Shot Object Pose Estimation without CAD Models [30.307122037051126]
OnePose does not rely on CAD models and can handle objects in arbitrary categories without instance- or category-specific network training.
OnePose draws the idea from visual localization and only requires a simple RGB video scan of the object to build a sparse SfM model of the object.
To mitigate the slow runtime of existing visual localization methods, we propose a new graph attention network that directly matches 2D interest points in the query image with the 3D points in the SfM model.
arXiv Detail & Related papers (2022-05-24T17:59:21Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects
from Point Clouds [97.63549045541296]
We propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances and per-part pose tracking for articulated objects.
Our method achieves new state-of-the-art performance on category-level rigid object pose (NOCS-REAL275) and articulated object pose benchmarks (SAPIEN, BMVC) at the fastest FPS 12.
arXiv Detail & Related papers (2021-04-08T00:14:58Z) - 3D Object Classification on Partial Point Clouds: A Practical
Perspective [91.81377258830703]
A point cloud is a popular shape representation adopted in 3D object classification.
This paper introduces a practical setting to classify partial point clouds of object instances under any poses.
A novel algorithm in an alignment-classification manner is proposed in this paper.
arXiv Detail & Related papers (2020-12-18T04:00:56Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.