Object Pose Estimation using Mid-level Visual Representations
- URL: http://arxiv.org/abs/2203.01449v1
- Date: Wed, 2 Mar 2022 22:49:17 GMT
- Title: Object Pose Estimation using Mid-level Visual Representations
- Authors: Negar Nejatishahidin, Pooya Fayyazsanavi, Jana Kosecka
- Abstract summary: This work proposes a novel pose estimation model for object categories that can be effectively transferred to previously unseen environments.
Deep convolutional network models (CNN) for pose estimation are typically trained and evaluated on datasets curated for object detection, pose estimation, or 3D reconstruction.
We show that the approach is favorable when it comes to generalization and transfer to novel environments.
- Score: 5.220940151628735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work proposes a novel pose estimation model for object categories that
can be effectively transferred to previously unseen environments. The deep
convolutional network models (CNN) for pose estimation are typically trained
and evaluated on datasets specifically curated for object detection, pose
estimation, or 3D reconstruction, which requires large amounts of training
data. In this work, we propose a model for pose estimation that can be trained
with small amount of data and is built on the top of generic mid-level
representations \cite{taskonomy2018} (e.g. surface normal estimation and
re-shading). These representations are trained on a large dataset without
requiring pose and object annotations. Later on, the predictions are refined
with a small CNN neural network that exploits object masks and silhouette
retrieval. The presented approach achieves superior performance on the Pix3D
dataset \cite{pix3d} and shows nearly 35\% improvement over the existing models
when only 25\% of the training data is available. We show that the approach is
favorable when it comes to generalization and transfer to novel environments.
Towards this end, we introduce a new pose estimation benchmark for commonly
encountered furniture categories on challenging Active Vision Dataset
\cite{Ammirato2017ADF} and evaluated the models trained on the Pix3D dataset.
Related papers
- OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation [56.028185293563325]
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation.
We first introduce OO3D-9D, a large-scale photorealistic dataset for this task.
We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models.
arXiv Detail & Related papers (2024-03-19T03:09:24Z) - GS-Pose: Category-Level Object Pose Estimation via Geometric and
Semantic Correspondence [5.500735640045456]
Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics.
We propose to utilize both geometric and semantic features obtained from a pre-trained foundation model.
This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance.
arXiv Detail & Related papers (2023-11-23T02:35:38Z) - MFOS: Model-Free & One-Shot Object Pose Estimation [10.009454818723025]
We introduce a novel approach that can estimate in a single forward pass the pose of objects never seen during training, given minimum input.
We conduct extensive experiments and report state-of-the-art one-shot performance on the challenging LINEMOD benchmark.
arXiv Detail & Related papers (2023-10-03T09:12:07Z) - NOPE: Novel Object Pose Estimation from a Single Image [67.11073133072527]
We propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model.
We achieve this by training a model to directly predict discriminative embeddings for viewpoints surrounding the object.
This prediction is done using a simple U-Net architecture with attention and conditioned on the desired pose, which yields extremely fast inference.
arXiv Detail & Related papers (2023-03-23T18:55:43Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - Zero-Shot Category-Level Object Pose Estimation [24.822189326540105]
We tackle the problem of estimating the pose of novel object categories in a zero-shot manner.
This extends much of the existing literature by removing the need for pose-labelled datasets or category-specific CAD models.
Our method provides a six-fold improvement in average rotation accuracy at 30 degrees.
arXiv Detail & Related papers (2022-04-07T17:58:39Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Towards General Purpose Geometry-Preserving Single-View Depth Estimation [1.9573380763700712]
Single-view depth estimation (SVDE) plays a crucial role in scene understanding for AR applications, 3D modeling, and robotics.
Recent works have shown that a successful solution strongly relies on the diversity and volume of training data.
Our work shows that a model trained on this data along with conventional datasets can gain accuracy while predicting correct scene geometry.
arXiv Detail & Related papers (2020-09-25T20:06:13Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.