3D-Augmented Contrastive Knowledge Distillation for Image-based Object
Pose Estimation
- URL: http://arxiv.org/abs/2206.02531v1
- Date: Thu, 2 Jun 2022 16:46:18 GMT
- Title: 3D-Augmented Contrastive Knowledge Distillation for Image-based Object
Pose Estimation
- Authors: Zhidan Liu, Zhen Xing, Xiangdong Zhou, Yijiang Chen, Guichun Zhou
- Abstract summary: We deal with the problem in a reasonable new setting, namely 3D shape is exploited in the training process, and the testing is still purely image-based.
We propose a novel contrastive knowledge distillation framework that effectively transfers 3D-augmented image representation from a multi-modal model to an image-based model.
We experimentally report state-of-the-art results compared with existing category-agnostic image-based methods by a large margin.
- Score: 4.415086501328683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-based object pose estimation sounds amazing because in real
applications the shape of object is oftentimes not available or not easy to
take like photos. Although it is an advantage to some extent, un-explored shape
information in 3D vision learning problem looks like "flaws in jade". In this
paper, we deal with the problem in a reasonable new setting, namely 3D shape is
exploited in the training process, and the testing is still purely image-based.
We enhance the performance of image-based methods for category-agnostic object
pose estimation by exploiting 3D knowledge learned by a multi-modal method.
Specifically, we propose a novel contrastive knowledge distillation framework
that effectively transfers 3D-augmented image representation from a multi-modal
model to an image-based model. We integrate contrastive learning into the
two-stage training procedure of knowledge distillation, which formulates an
advanced solution to combine these two approaches for cross-modal tasks. We
experimentally report state-of-the-art results compared with existing
category-agnostic image-based methods by a large margin (up to +5% improvement
on ObjectNet3D dataset), demonstrating the effectiveness of our method.
Related papers
- Learning 3D-Aware GANs from Unposed Images with Template Feature Field [33.32761749864555]
This work targets learning 3D-aware GANs from unposed images.
We propose to perform on-the-fly pose estimation of training images with a learned template feature field (TeFF)
arXiv Detail & Related papers (2024-04-08T17:42:08Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - End-to-End Learning of Multi-category 3D Pose and Shape Estimation [128.881857704338]
We propose an end-to-end method that simultaneously detects 2D keypoints from an image and lifts them to 3D.
The proposed method learns both 2D detection and 3D lifting only from 2D keypoints annotations.
In addition to being end-to-end in image to 3D learning, our method also handles objects from multiple categories using a single neural network.
arXiv Detail & Related papers (2021-12-19T17:10:40Z) - Learning Stereopsis from Geometric Synthesis for 6D Object Pose
Estimation [11.999630902627864]
Current monocular-based 6D object pose estimation methods generally achieve less competitive results than RGBD-based methods.
This paper proposes a 3D geometric volume based pose estimation method with a short baseline two-view setting.
Experiments show that our method outperforms state-of-the-art monocular-based methods, and is robust in different objects and scenes.
arXiv Detail & Related papers (2021-09-25T02:55:05Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild
with Pose-Aware Contrastive Learning [23.608940131120637]
We consider the challenging problem of class-agnostic 3D object pose estimation, with no 3D shape knowledge.
The idea is to leverage features learned on seen classes to estimate the pose for classes that are unseen, yet that share similar geometries and canonical frames with seen classes.
We report state-of-the-art results, including against methods that use additional shape information, and also when we use detected bounding boxes.
arXiv Detail & Related papers (2021-05-12T13:21:24Z) - Using Shape to Categorize: Low-Shot Learning with an Explicit Shape Bias [22.863686803150625]
We investigate how reasoning about 3D shape can be used to improve low-shot learning methods' generalization performance.
We propose a new way to improve existing low-shot learning approaches by learning a discriminative embedding space using 3D object shape.
We also develop Toys4K, a new 3D object dataset with the biggest number of object categories that can also support low-shot learning.
arXiv Detail & Related papers (2021-01-18T19:29:41Z) - 3D Registration for Self-Occluded Objects in Context [66.41922513553367]
We introduce the first deep learning framework capable of effectively handling this scenario.
Our method consists of an instance segmentation module followed by a pose estimation one.
It allows us to perform 3D registration in a one-shot manner, without requiring an expensive iterative procedure.
arXiv Detail & Related papers (2020-11-23T08:05:28Z) - 3D Human Shape and Pose from a Single Low-Resolution Image with
Self-Supervised Learning [105.49950571267715]
Existing deep learning methods for 3D human shape and pose estimation rely on relatively high-resolution input images.
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
We show that both these new training losses provide robustness when learning 3D shape and pose in a weakly-supervised manner.
arXiv Detail & Related papers (2020-07-27T16:19:52Z) - Learning Pose-invariant 3D Object Reconstruction from Single-view Images [61.98279201609436]
In this paper, we explore a more realistic setup of learning 3D shape from only single-view images.
The major difficulty lies in insufficient constraints that can be provided by single view images.
We propose an effective adversarial domain confusion method to learn pose-disentangled compact shape space.
arXiv Detail & Related papers (2020-04-03T02:47:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.