Related papers: Novel Object Viewpoint Estimation through Reconstruction Alignment

Novel Object Viewpoint Estimation through Reconstruction Alignment

URL: http://arxiv.org/abs/2006.03586v1
Date: Fri, 5 Jun 2020 17:58:14 GMT
Title: Novel Object Viewpoint Estimation through Reconstruction Alignment
Authors: Mohamed El Banani, Jason J. Corso, David F. Fouhey
Abstract summary: We learn a reconstruct and align approach to estimate the viewpoint of a novel object. In particular, we propose learning two networks: the first maps images to a 3D geometry-aware feature bottleneck and is trained via an image-to-image translation loss. At test time, our model finds the relative transformation that best aligns the bottleneck features of our test image to a reference image.
Score: 45.16865218423492
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of this paper is to estimate the viewpoint for a novel object. Standard viewpoint estimation approaches generally fail on this task due to their reliance on a 3D model for alignment or large amounts of class-specific training data and their corresponding canonical pose. We overcome those limitations by learning a reconstruct and align approach. Our key insight is that although we do not have an explicit 3D model or a predefined canonical pose, we can still learn to estimate the object's shape in the viewer's frame and then use an image to provide our reference model or canonical pose. In particular, we propose learning two networks: the first maps images to a 3D geometry-aware feature bottleneck and is trained via an image-to-image translation loss; the second learns whether two instances of features are aligned. At test time, our model finds the relative transformation that best aligns the bottleneck features of our test image to a reference image. We evaluate our method on novel object viewpoint estimation by generalizing across different datasets, analyzing the impact of our different modules, and providing a qualitative analysis of the learned features to identify what representations are being learnt for alignment.

Related papers

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [93.6881532277553]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images. Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z)
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models [79.96917782423219]
Orient Anything is the first expert and foundational model designed to estimate object orientation in a single image. By developing a pipeline to annotate the front face of 3D objects, we collect 2M images with precise orientation annotations. Our model achieves state-of-the-art orientation estimation accuracy in both rendered and real images.
arXiv Detail & Related papers (2024-12-24T18:58:43Z)
Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching [19.730504197461144]
We present a novel generalizable object pose estimation method to determine the object pose using only one RGB image. Our method offers generalization to unseen objects without extensive training, operates with a single reference image of the object, and eliminates the need for 3D object models or multiple views of the object.
arXiv Detail & Related papers (2024-11-24T14:31:50Z)
Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics. We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z)
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking. Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z)
GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence [5.500735640045456]
Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics. We propose to utilize both geometric and semantic features obtained from a pre-trained foundation model. This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance.
arXiv Detail & Related papers (2023-11-23T02:35:38Z)
MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training. We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z)
Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z)
Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation [30.04752448942084]
Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models. We propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.
arXiv Detail & Related papers (2021-10-30T06:46:44Z)
A Divide et Impera Approach for 3D Shape Reconstruction from Multiple Views [49.03830902235915]
Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning. This paper proposes to rely on viewpoint variant reconstructions by merging the visible information from the given views. To validate the proposed method, we perform a comprehensive evaluation on the ShapeNet reference benchmark in terms of relative pose estimation and 3D shape reconstruction.
arXiv Detail & Related papers (2020-11-17T09:59:32Z)
Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image. We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision. We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.