Learning a Category-level Object Pose Estimator without Pose Annotations
- URL: http://arxiv.org/abs/2404.05626v1
- Date: Mon, 8 Apr 2024 15:59:29 GMT
- Title: Learning a Category-level Object Pose Estimator without Pose Annotations
- Authors: Fengrui Tian, Yaoyao Liu, Adam Kortylewski, Yueqi Duan, Shaoyi Du, Alan Yuille, Angtian Wang,
- Abstract summary: We propose to learn a category-level 3D object pose estimator without pose annotations.
Instead of using manually annotated images, we leverage diffusion models to generate a set of images under controlled pose differences.
We show that our method has the capability of category-level object pose estimation from a single shot setting.
- Score: 37.03715008347576
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D object pose estimation is a challenging task. Previous works always require thousands of object images with annotated poses for learning the 3D pose correspondence, which is laborious and time-consuming for labeling. In this paper, we propose to learn a category-level 3D object pose estimator without pose annotations. Instead of using manually annotated images, we leverage diffusion models (e.g., Zero-1-to-3) to generate a set of images under controlled pose differences and propose to learn our object pose estimator with those images. Directly using the original diffusion model leads to images with noisy poses and artifacts. To tackle this issue, firstly, we exploit an image encoder, which is learned from a specially designed contrastive pose learning, to filter the unreasonable details and extract image feature maps. Additionally, we propose a novel learning strategy that allows the model to learn object poses from those generated image sets without knowing the alignment of their canonical poses. Experimental results show that our method has the capability of category-level object pose estimation from a single shot setting (as pose definition), while significantly outperforming other state-of-the-art methods on the few-shot category-level object pose estimation benchmarks.
Related papers
- ContraNeRF: 3D-Aware Generative Model via Contrastive Learning with
Unsupervised Implicit Pose Embedding [40.36882490080341]
We propose a novel 3D-aware GAN optimization technique through contrastive learning with implicit pose embeddings.
We make the discriminator estimate a high-dimensional implicit pose embedding from a given image and perform contrastive learning on the pose embedding.
The proposed approach can be employed for the dataset, where the canonical camera pose is ill-defined because it does not look up or estimate camera poses.
arXiv Detail & Related papers (2023-04-27T07:53:13Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - NOPE: Novel Object Pose Estimation from a Single Image [67.11073133072527]
We propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model.
We achieve this by training a model to directly predict discriminative embeddings for viewpoints surrounding the object.
This prediction is done using a simple U-Net architecture with attention and conditioned on the desired pose, which yields extremely fast inference.
arXiv Detail & Related papers (2023-03-23T18:55:43Z) - Few-View Object Reconstruction with Unknown Categories and Camera Poses [80.0820650171476]
This work explores reconstructing general real-world objects from a few images without known camera poses or object categories.
The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation.
Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence.
arXiv Detail & Related papers (2022-12-08T18:59:02Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild
with Pose-Aware Contrastive Learning [23.608940131120637]
We consider the challenging problem of class-agnostic 3D object pose estimation, with no 3D shape knowledge.
The idea is to leverage features learned on seen classes to estimate the pose for classes that are unseen, yet that share similar geometries and canonical frames with seen classes.
We report state-of-the-art results, including against methods that use additional shape information, and also when we use detected bounding boxes.
arXiv Detail & Related papers (2021-05-12T13:21:24Z) - Neural Object Learning for 6D Pose Estimation Using a Few Cluttered
Images [30.240630713652035]
Recent methods for 6D pose estimation of objects assume either textured 3D models or real images that cover the entire range of target poses.
This paper proposes a method, Neural Object Learning (NOL), that creates synthetic images of objects in arbitrary poses by combining only a few observations from cluttered images.
arXiv Detail & Related papers (2020-05-07T19:33:06Z) - Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.