Generative Category-Level Shape and Pose Estimation with Semantic
Primitives
- URL: http://arxiv.org/abs/2210.01112v1
- Date: Mon, 3 Oct 2022 17:51:54 GMT
- Title: Generative Category-Level Shape and Pose Estimation with Semantic
Primitives
- Authors: Guanglin Li, Yifeng Li, Zhichao Ye, Qihang Zhang, Tao Kong, Zhaopeng
Cui, Guofeng Zhang
- Abstract summary: We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image.
To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space.
We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
- Score: 27.692997522812615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Empowering autonomous agents with 3D understanding for daily objects is a
grand challenge in robotics applications. When exploring in an unknown
environment, existing methods for object pose estimation are still not
satisfactory due to the diversity of object shapes. In this paper, we propose a
novel framework for category-level object shape and pose estimation from a
single RGB-D image. To handle the intra-category variation, we adopt a semantic
primitive representation that encodes diverse shapes into a unified latent
space, which is the key to establish reliable correspondences between observed
point clouds and estimated shapes. Then, by using a SIM(3)-invariant shape
descriptor, we gracefully decouple the shape and pose of an object, thus
supporting latent shape optimization of target objects in arbitrary poses.
Extensive experiments show that the proposed method achieves SOTA pose
estimation performance and better generalization in the real-world dataset.
Code and video are available at https://zju3dv.github.io/gCasp
Related papers
- LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation [43.549593231397644]
LaPose is a novel framework that models the object shape as the Laplacian mixture model for Pose estimation.
By representing each point as a probabilistic distribution, we explicitly quantify the shape uncertainty.
LaPose yields state-of-the-art performance in category-level object pose estimation.
arXiv Detail & Related papers (2024-09-24T04:20:18Z) - OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation [56.028185293563325]
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation.
We first introduce OO3D-9D, a large-scale photorealistic dataset for this task.
We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models.
arXiv Detail & Related papers (2024-03-19T03:09:24Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - ShapeShift: Superquadric-based Object Pose Estimation for Robotic
Grasping [85.38689479346276]
Current techniques heavily rely on a reference 3D object, limiting their generalizability and making it expensive to expand to new object categories.
This paper proposes ShapeShift, a superquadric-based framework for object pose estimation that predicts the object's pose relative to a primitive shape which is fitted to the object.
arXiv Detail & Related papers (2023-04-10T20:55:41Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and
Pose Optimization [40.36229450208817]
We present ShAPO, a method for joint multi-object detection, 3D textured reconstruction, 6D object pose and size estimation.
Key to ShAPO is a single-shot pipeline to regress shape, appearance and pose latent codes along with the masks of each object instance.
Our method significantly out-performs all baselines on the NOCS dataset with an 8% absolute improvement in mAP for 6D pose estimation.
arXiv Detail & Related papers (2022-07-27T17:59:31Z) - ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level
Ellipsoid and Signed Distance Function Description [9.734266860544663]
This paper proposes an expressive yet compact model for joint object pose and shape optimization.
It infers an object-level map from multi-view RGB-D camera observations.
Our approach is evaluated on the large-scale real-world ScanNet dataset and compared against state-of-the-art methods.
arXiv Detail & Related papers (2021-08-01T03:07:31Z) - Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose
Estimation [44.8872454995923]
We present a novel approach for scalable 6D pose estimation, by self-supervised learning on synthetic data of multiple objects using a single autoencoder.
We test our method on two multi-object benchmarks with real data, T-LESS and NOCS REAL275, and show it outperforms existing RGB-based methods in terms of pose estimation accuracy and generalization.
arXiv Detail & Related papers (2021-07-27T01:55:30Z) - From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.