GenPose: Generative Category-level Object Pose Estimation via Diffusion
Models
- URL: http://arxiv.org/abs/2306.10531v3
- Date: Mon, 25 Dec 2023 08:03:49 GMT
- Title: GenPose: Generative Category-level Object Pose Estimation via Diffusion
Models
- Authors: Jiyao Zhang, Mingdong Wu and Hao Dong
- Abstract summary: We propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling.
Our approach achieves state-of-the-art performance on the REAL275 dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics.
- Score: 5.1998359768382905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object pose estimation plays a vital role in embodied AI and computer vision,
enabling intelligent agents to comprehend and interact with their surroundings.
Despite the practicality of category-level pose estimation, current approaches
encounter challenges with partially observed point clouds, known as the
multihypothesis issue. In this study, we propose a novel solution by reframing
categorylevel object pose estimation as conditional generative modeling,
departing from traditional point-to-point regression. Leveraging score-based
diffusion models, we estimate object poses by sampling candidates from the
diffusion model and aggregating them through a two-step process: filtering out
outliers via likelihood estimation and subsequently mean-pooling the remaining
candidates. To avoid the costly integration process when estimating the
likelihood, we introduce an alternative method that trains an energy-based
model from the original score-based model, enabling end-to-end likelihood
estimation. Our approach achieves state-of-the-art performance on the REAL275
dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics,
respectively. Furthermore, our method demonstrates strong generalizability to
novel categories sharing similar symmetric properties without fine-tuning and
can readily adapt to object pose tracking tasks, yielding comparable results to
the current state-of-the-art baselines.
Related papers
- OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation [7.022004731560844]
Category-level articulated object pose estimation focuses on the pose estimation of unknown articulated objects within known categories.
We propose a novel self-supervised approach that leverages a single-frame point cloud to solve this task.
Our model consistently generates reconstruction with a canonical pose and joint state for the entire input object.
arXiv Detail & Related papers (2024-08-29T14:10:14Z) - DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal
Category-level Pose Estimation [20.676510832922016]
We propose a probabilistic model that relies on diffusion to estimate dense canonical maps crucial for recovering partial object shapes.
We introduce critical components to enhance performance by leveraging the strength of the diffusion models with multi-modal input representations.
Despite being trained solely on our generated synthetic data, our approach achieves state-of-the-art performance and unprecedented generalization qualities.
arXiv Detail & Related papers (2024-02-20T01:48:33Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation [86.80589902825196]
We propose a method of test-time adaptation for category-level object pose estimation called TTA-COPE.
We design a pose ensemble approach with a self-training loss using pose-aware confidence.
Our approach processes the test data in a sequential, online manner, and it does not require access to the source domain at runtime.
arXiv Detail & Related papers (2023-03-29T14:34:54Z) - CATRE: Iterative Point Clouds Alignment for Category-level Object Pose
Refinement [52.41884119329864]
Category-level object pose and size refiner CATRE is able to iteratively enhance pose estimate from point clouds to produce accurate results.
Our approach remarkably outperforms state-of-the-art methods on REAL275, CAMERA25, and LM benchmarks up to a speed of 85.32Hz.
arXiv Detail & Related papers (2022-07-17T05:55:00Z) - Distributional Depth-Based Estimation of Object Articulation Models [21.046351215949525]
We propose a method that efficiently learns distributions over articulation model parameters directly from depth images.
Our core contributions include a novel representation for distributions over rigid body transformations.
We introduce a novel deep learning based approach, DUST-net, that performs category-independent articulation model estimation.
arXiv Detail & Related papers (2021-08-12T17:44:51Z) - Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation.
We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution.
We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z) - Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints [80.60538408386016]
Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry.
We propose an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection.
arXiv Detail & Related papers (2020-07-29T21:41:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.