Generalizable Pose Estimation Using Implicit Scene Representations
- URL: http://arxiv.org/abs/2305.17252v1
- Date: Fri, 26 May 2023 20:42:52 GMT
- Title: Generalizable Pose Estimation Using Implicit Scene Representations
- Authors: Vaibhav Saxena, Kamal Rahimi Malekshan, Linh Tran, Yotto Koga
- Abstract summary: 6-DoF pose estimation is an essential component of robotic manipulation pipelines.
We address the generalization capability of pose estimation using models that contain enough information to render it in different poses.
Our final evaluation shows a significant improvement in inference performance and speed compared to existing approaches.
- Score: 4.124185654280966
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 6-DoF pose estimation is an essential component of robotic manipulation
pipelines. However, it usually suffers from a lack of generalization to new
instances and object types. Most widely used methods learn to infer the object
pose in a discriminative setup where the model filters useful information to
infer the exact pose of the object. While such methods offer accurate poses,
the model does not store enough information to generalize to new objects. In
this work, we address the generalization capability of pose estimation using
models that contain enough information about the object to render it in
different poses. We follow the line of work that inverts neural renderers to
infer the pose. We propose i-$\sigma$SRN to maximize the information flowing
from the input pose to the rendered scene and invert them to infer the pose
given an input image. Specifically, we extend Scene Representation Networks
(SRNs) by incorporating a separate network for density estimation and introduce
a new way of obtaining a weighted scene representation. We investigate several
ways of initial pose estimates and losses for the neural renderer. Our final
evaluation shows a significant improvement in inference performance and speed
compared to existing approaches.
Related papers
- Learning a Category-level Object Pose Estimator without Pose Annotations [37.03715008347576]
We propose to learn a category-level 3D object pose estimator without pose annotations.
Instead of using manually annotated images, we leverage diffusion models to generate a set of images under controlled pose differences.
We show that our method has the capability of category-level object pose estimation from a single shot setting.
arXiv Detail & Related papers (2024-04-08T15:59:29Z) - DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - NOPE: Novel Object Pose Estimation from a Single Image [67.11073133072527]
We propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model.
We achieve this by training a model to directly predict discriminative embeddings for viewpoints surrounding the object.
This prediction is done using a simple U-Net architecture with attention and conditioned on the desired pose, which yields extremely fast inference.
arXiv Detail & Related papers (2023-03-23T18:55:43Z) - Few-View Object Reconstruction with Unknown Categories and Camera Poses [80.0820650171476]
This work explores reconstructing general real-world objects from a few images without known camera poses or object categories.
The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation.
Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence.
arXiv Detail & Related papers (2022-12-08T18:59:02Z) - ZePHyR: Zero-shot Pose Hypothesis Rating [36.52070583343388]
We introduce a novel method for zero-shot object pose estimation in clutter.
Our approach uses a hypothesis generation and scoring framework, with a focus on learning a scoring function that generalizes to objects not used for training.
We demonstrate how our system can be used by quickly scanning and building a model of a novel object, which can immediately be used by our method for pose estimation.
arXiv Detail & Related papers (2021-04-28T01:48:39Z) - DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale
Consistency [43.09728251735362]
We present a two-step pose estimation framework to attain 6DoF object poses from 2D object bounding-boxes.
In the first step, the framework learns to segment objects from real and synthetic data.
In the second step, we design a dual-scale pose estimation network, namely DSC-PoseNet.
Our method outperforms state-of-the-art models trained on synthetic data by a large margin.
arXiv Detail & Related papers (2021-04-08T10:19:35Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z) - I Like to Move It: 6D Pose Estimation as an Action Decision Process [53.63776807432945]
Object pose estimation is an integral part of robot vision and AR.
Previous 6D pose retrieval pipelines treat the problem either as a regression task or discretize the pose space to classify.
We change this paradigm and reformulate the problem as an action decision process where an initial pose is updated in incremental discrete steps.
A neural network estimates likely moves from a single RGB image iteratively and determines so an acceptable final pose.
arXiv Detail & Related papers (2020-09-26T20:05:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.