PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild
with Pose-Aware Contrastive Learning
- URL: http://arxiv.org/abs/2105.05643v1
- Date: Wed, 12 May 2021 13:21:24 GMT
- Title: PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild
with Pose-Aware Contrastive Learning
- Authors: Yang Xiao, Yuming Du, Renaud Marlet
- Abstract summary: We consider the challenging problem of class-agnostic 3D object pose estimation, with no 3D shape knowledge.
The idea is to leverage features learned on seen classes to estimate the pose for classes that are unseen, yet that share similar geometries and canonical frames with seen classes.
We report state-of-the-art results, including against methods that use additional shape information, and also when we use detected bounding boxes.
- Score: 23.608940131120637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by the need of estimating the pose (viewpoint) of arbitrary objects
in the wild, which is only covered by scarce and small datasets, we consider
the challenging problem of class-agnostic 3D object pose estimation, with no 3D
shape knowledge. The idea is to leverage features learned on seen classes to
estimate the pose for classes that are unseen, yet that share similar
geometries and canonical frames with seen classes. For this, we train a direct
pose estimator in a class-agnostic way by sharing weights across all object
classes, and we introduce a contrastive learning method that has three main
ingredients: (i) the use of pre-trained, self-supervised, contrast-based
features; (ii) pose-aware data augmentations; (iii) a pose-aware contrastive
loss. We experimented on Pascal3D+ and ObjectNet3D, as well as Pix3D in a
cross-dataset fashion, with both seen and unseen classes. We report
state-of-the-art results, including against methods that use additional shape
information, and also when we use detected bounding boxes.
Related papers
- Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - Learning a Category-level Object Pose Estimator without Pose Annotations [37.03715008347576]
We propose to learn a category-level 3D object pose estimator without pose annotations.
Instead of using manually annotated images, we leverage diffusion models to generate a set of images under controlled pose differences.
We show that our method has the capability of category-level object pose estimation from a single shot setting.
arXiv Detail & Related papers (2024-04-08T15:59:29Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - Understanding Pose and Appearance Disentanglement in 3D Human Pose
Estimation [72.50214227616728]
Several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one.
We study disentanglement from the perspective of the self-supervised network, via diverse image synthesis experiments.
We design an adversarial strategy focusing on generating natural appearance changes of the subject, and against which we could expect a disentangled network to be robust.
arXiv Detail & Related papers (2023-09-20T22:22:21Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - 3D-Augmented Contrastive Knowledge Distillation for Image-based Object
Pose Estimation [4.415086501328683]
We deal with the problem in a reasonable new setting, namely 3D shape is exploited in the training process, and the testing is still purely image-based.
We propose a novel contrastive knowledge distillation framework that effectively transfers 3D-augmented image representation from a multi-modal model to an image-based model.
We experimentally report state-of-the-art results compared with existing category-agnostic image-based methods by a large margin.
arXiv Detail & Related papers (2022-06-02T16:46:18Z) - End-to-End Learning of Multi-category 3D Pose and Shape Estimation [128.881857704338]
We propose an end-to-end method that simultaneously detects 2D keypoints from an image and lifts them to 3D.
The proposed method learns both 2D detection and 3D lifting only from 2D keypoints annotations.
In addition to being end-to-end in image to 3D learning, our method also handles objects from multiple categories using a single neural network.
arXiv Detail & Related papers (2021-12-19T17:10:40Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - 3D Registration for Self-Occluded Objects in Context [66.41922513553367]
We introduce the first deep learning framework capable of effectively handling this scenario.
Our method consists of an instance segmentation module followed by a pose estimation one.
It allows us to perform 3D registration in a one-shot manner, without requiring an expensive iterative procedure.
arXiv Detail & Related papers (2020-11-23T08:05:28Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.