Semantic keypoint-based pose estimation from single RGB frames
- URL: http://arxiv.org/abs/2204.05864v1
- Date: Tue, 12 Apr 2022 15:03:51 GMT
- Title: Semantic keypoint-based pose estimation from single RGB frames
- Authors: Karl Schmeckpeper, Philip R. Osteen, Yufu Wang, Georgios Pavlakos,
Kenneth Chaney, Wyatt Jordan, Xiaowei Zhou, Konstantinos G. Derpanis, and
Kostas Daniilidis
- Abstract summary: We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
- Score: 64.80395521735463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an approach to estimating the continuous 6-DoF pose of an
object from a single RGB image. The approach combines semantic keypoints
predicted by a convolutional network (convnet) with a deformable shape model.
Unlike prior investigators, we are agnostic to whether the object is textured
or textureless, as the convnet learns the optimal representation from the
available training-image data. Furthermore, the approach can be applied to
instance- and class-based pose recovery. Additionally, we accompany our main
pipeline with a technique for semi-automatic data generation from unlabeled
videos. This procedure allows us to train the learnable components of our
method with minimal manual intervention in the labeling process. Empirically,
we show that our approach can accurately recover the 6-DoF object pose for both
instance- and class-based scenarios even against a cluttered background. We
apply our approach both to several, existing, large-scale datasets - including
PASCAL3D+, LineMOD-Occluded, YCB-Video, and TUD-Light - and, using our labeling
pipeline, to a new dataset with novel object classes that we introduce here.
Extensive empirical evaluations show that our approach is able to provide pose
estimation results comparable to the state of the art.
Related papers
- Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair.
We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - FoundPose: Unseen Object Pose Estimation with Foundation Features [11.32559845631345]
FoundPose is a model-based method for 6D pose estimation of unseen objects from a single RGB image.
The method can quickly onboard new objects using their 3D models without requiring any object- or task-specific training.
arXiv Detail & Related papers (2023-11-30T18:52:29Z) - Diff-DOPE: Differentiable Deep Object Pose Estimation [29.703385848843414]
We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object.
The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model.
We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation datasets.
arXiv Detail & Related papers (2023-09-30T18:52:57Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - CPPF++: Uncertainty-Aware Sim2Real Object Pose Estimation by Vote Aggregation [67.12857074801731]
We introduce a novel method, CPPF++, designed for sim-to-real pose estimation.
To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty.
We incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a feature ensemble.
arXiv Detail & Related papers (2022-11-24T03:27:00Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - Category-Agnostic 6D Pose Estimation with Conditional Neural Processes [19.387280883044482]
We present a novel meta-learning approach for 6D pose estimation on unknown objects.
Our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories.
arXiv Detail & Related papers (2022-06-14T20:46:09Z) - A Hybrid Approach for 6DoF Pose Estimation [4.200736775540874]
We propose a method for 6DoF pose estimation using a state-of-the-art deep learning based instance detector.
We additionally use an automatic method selection that chooses the instance detector and the training set as that with the highest performance on the validation set.
This hybrid approach leverages the best of learning and classic approaches, using CNNs to filter highly unstructured data and cut through the clutter, and a local geometric approach with proven convergence for robust pose estimation.
arXiv Detail & Related papers (2020-11-11T09:58:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.