SDFEst: Categorical Pose and Shape Estimation of Objects from RGB-D
using Signed Distance Fields
- URL: http://arxiv.org/abs/2207.04880v1
- Date: Mon, 11 Jul 2022 13:53:50 GMT
- Title: SDFEst: Categorical Pose and Shape Estimation of Objects from RGB-D
using Signed Distance Fields
- Authors: Leonard Bruns and Patric Jensfelt
- Abstract summary: We present a modular pipeline for pose and shape estimation of objects from RGB-D images.
We integrate a generative shape model with a novel network to enable 6D pose and shape estimation from a single or multiple views.
We demonstrate the benefits of our approach over state-of-the-art methods in several experiments on both synthetic and real data.
- Score: 5.71097144710995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rich geometric understanding of the world is an important component of many
robotic applications such as planning and manipulation. In this paper, we
present a modular pipeline for pose and shape estimation of objects from RGB-D
images given their category. The core of our method is a generative shape
model, which we integrate with a novel initialization network and a
differentiable renderer to enable 6D pose and shape estimation from a single or
multiple views. We investigate the use of discretized signed distance fields as
an efficient shape representation for fast analysis-by-synthesis optimization.
Our modular framework enables multi-view optimization and extensibility. We
demonstrate the benefits of our approach over state-of-the-art methods in
several experiments on both synthetic and real data. We open-source our
approach at https://github.com/roym899/sdfest.
Related papers
- FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - Towards Scalable Multi-View Reconstruction of Geometry and Materials [27.660389147094715]
We propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes.
The input are high-resolution RGBD images captured by a mobile, hand-held capture system with point lights for active illumination.
arXiv Detail & Related papers (2023-06-06T15:07:39Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - Generative Category-Level Shape and Pose Estimation with Semantic
Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image.
To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space.
We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z) - NodeSLAM: Neural Object Descriptors for Multi-View Shape Reconstruction [4.989480853499916]
We present efficient and optimisable multi-class learned object descriptors together with a novel probabilistic and differential rendering engine.
Our framework allows for accurate and robust 3D object reconstruction which enables multiple applications including robot grasping and placing, augmented reality, and the first object-level SLAM system.
arXiv Detail & Related papers (2020-04-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.