Novel Object Viewpoint Estimation through Reconstruction Alignment
- URL: http://arxiv.org/abs/2006.03586v1
- Date: Fri, 5 Jun 2020 17:58:14 GMT
- Title: Novel Object Viewpoint Estimation through Reconstruction Alignment
- Authors: Mohamed El Banani, Jason J. Corso, David F. Fouhey
- Abstract summary: We learn a reconstruct and align approach to estimate the viewpoint of a novel object.
In particular, we propose learning two networks: the first maps images to a 3D geometry-aware feature bottleneck and is trained via an image-to-image translation loss.
At test time, our model finds the relative transformation that best aligns the bottleneck features of our test image to a reference image.
- Score: 45.16865218423492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of this paper is to estimate the viewpoint for a novel object.
Standard viewpoint estimation approaches generally fail on this task due to
their reliance on a 3D model for alignment or large amounts of class-specific
training data and their corresponding canonical pose. We overcome those
limitations by learning a reconstruct and align approach. Our key insight is
that although we do not have an explicit 3D model or a predefined canonical
pose, we can still learn to estimate the object's shape in the viewer's frame
and then use an image to provide our reference model or canonical pose. In
particular, we propose learning two networks: the first maps images to a 3D
geometry-aware feature bottleneck and is trained via an image-to-image
translation loss; the second learns whether two instances of features are
aligned. At test time, our model finds the relative transformation that best
aligns the bottleneck features of our test image to a reference image. We
evaluate our method on novel object viewpoint estimation by generalizing across
different datasets, analyzing the impact of our different modules, and
providing a qualitative analysis of the learned features to identify what
representations are being learnt for alignment.
Related papers
- FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [93.6881532277553]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images.
Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z) - Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models [79.96917782423219]
Orient Anything is the first expert and foundational model designed to estimate object orientation in a single image.
By developing a pipeline to annotate the front face of 3D objects, we collect 2M images with precise orientation annotations.
Our model achieves state-of-the-art orientation estimation accuracy in both rendered and real images.
arXiv Detail & Related papers (2024-12-24T18:58:43Z) - Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching [19.730504197461144]
We present a novel generalizable object pose estimation method to determine the object pose using only one RGB image.
Our method offers generalization to unseen objects without extensive training, operates with a single reference image of the object, and eliminates the need for 3D object models or multiple views of the object.
arXiv Detail & Related papers (2024-11-24T14:31:50Z) - Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - GS-Pose: Category-Level Object Pose Estimation via Geometric and
Semantic Correspondence [5.500735640045456]
Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics.
We propose to utilize both geometric and semantic features obtained from a pre-trained foundation model.
This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance.
arXiv Detail & Related papers (2023-11-23T02:35:38Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - A Divide et Impera Approach for 3D Shape Reconstruction from Multiple
Views [49.03830902235915]
Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning.
This paper proposes to rely on viewpoint variant reconstructions by merging the visible information from the given views.
To validate the proposed method, we perform a comprehensive evaluation on the ShapeNet reference benchmark in terms of relative pose estimation and 3D shape reconstruction.
arXiv Detail & Related papers (2020-11-17T09:59:32Z) - Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image.
We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.