3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for
Robust 6D Pose Estimation
- URL: http://arxiv.org/abs/2302.03744v3
- Date: Wed, 6 Sep 2023 21:38:20 GMT
- Title: 3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for
Robust 6D Pose Estimation
- Authors: Guangyao Zhou, Nishad Gothoskar, Lirui Wang, Joshua B. Tenenbaum, Dan
Gutfreund, Miguel L\'azaro-Gredilla, Dileep George, Vikash K. Mansinghka
- Abstract summary: Inverse graphics aims to infer the 3D scene structure from 2D images.
We introduce probabilistic modeling to quantify uncertainty and achieve robustness in 6D pose estimation tasks.
3DNEL effectively combines learned neural embeddings from RGB with depth information to improve robustness in sim-to-real 6D object pose estimation from RGB-D images.
- Score: 50.15926681475939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to perceive and understand 3D scenes is crucial for many
applications in computer vision and robotics. Inverse graphics is an appealing
approach to 3D scene understanding that aims to infer the 3D scene structure
from 2D images. In this paper, we introduce probabilistic modeling to the
inverse graphics framework to quantify uncertainty and achieve robustness in 6D
pose estimation tasks. Specifically, we propose 3D Neural Embedding Likelihood
(3DNEL) as a unified probabilistic model over RGB-D images, and develop
efficient inference procedures on 3D scene descriptions. 3DNEL effectively
combines learned neural embeddings from RGB with depth information to improve
robustness in sim-to-real 6D object pose estimation from RGB-D images.
Performance on the YCB-Video dataset is on par with state-of-the-art yet is
much more robust in challenging regimes. In contrast to discriminative
approaches, 3DNEL's probabilistic generative formulation jointly models
multiple objects in a scene, quantifies uncertainty in a principled way, and
handles object pose tracking under heavy occlusion. Finally, 3DNEL provides a
principled framework for incorporating prior knowledge about the scene and
objects, which allows natural extension to additional tasks like camera pose
tracking from video.
Related papers
- SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors [15.34487368683311]
We propose a framework that can reconstruct high-quality object-level maps for unknown objects.
Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses for detected objects.
We derive a probabilistic formulation that propagates shape and pose uncertainty through two novel loss functions.
arXiv Detail & Related papers (2023-09-17T00:48:19Z) - 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive
Physics under Challenging Scenes [68.66237114509264]
We present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids.
We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space.
arXiv Detail & Related papers (2023-04-22T19:28:49Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - 6D Object Pose Estimation from Approximate 3D Models for Orbital
Robotics [19.64111218032901]
We present a novel technique to estimate the 6D pose of objects from single images.
We employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel.
Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition.
arXiv Detail & Related papers (2023-03-23T13:18:05Z) - Uncertainty Guided Policy for Active Robotic 3D Reconstruction using
Neural Radiance Fields [82.21033337949757]
This paper introduces a ray-based volumetric uncertainty estimator, which computes the entropy of the weight distribution of the color samples along each ray of the object's implicit neural representation.
We show that it is possible to infer the uncertainty of the underlying 3D geometry given a novel view with the proposed estimator.
We present a next-best-view selection policy guided by the ray-based volumetric uncertainty in neural radiance fields-based representations.
arXiv Detail & Related papers (2022-09-17T21:28:57Z) - Towards Two-view 6D Object Pose Estimation: A Comparative Study on
Fusion Strategy [16.65699606802237]
Current RGB-based 6D object pose estimation methods have achieved noticeable performance on datasets and real world applications.
This paper proposes a framework for 6D object pose estimation that learns implicit 3D information from 2 RGB images.
arXiv Detail & Related papers (2022-07-01T08:22:34Z) - Pose Estimation of Specific Rigid Objects [0.7931904787652707]
We address the problem of estimating the 6D pose of rigid objects from a single RGB or RGB-D input image.
This problem is of great importance to many application fields such as robotic manipulation, augmented reality, and autonomous driving.
arXiv Detail & Related papers (2021-12-30T14:36:47Z) - 3DP3: 3D Scene Perception via Probabilistic Programming [28.491817202574932]
3DP3 is a framework for inverse graphics that uses inference in a structured generative model of objects, scenes, and images.
Our results demonstrate that 3DP3 is more accurate at 6DoF object pose estimation from real images than deep learning baselines.
arXiv Detail & Related papers (2021-10-30T19:10:34Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.