NodeSLAM: Neural Object Descriptors for Multi-View Shape Reconstruction
- URL: http://arxiv.org/abs/2004.04485v2
- Date: Sat, 10 Oct 2020 16:41:59 GMT
- Title: NodeSLAM: Neural Object Descriptors for Multi-View Shape Reconstruction
- Authors: Edgar Sucar, Kentaro Wada, and Andrew Davison
- Abstract summary: We present efficient and optimisable multi-class learned object descriptors together with a novel probabilistic and differential rendering engine.
Our framework allows for accurate and robust 3D object reconstruction which enables multiple applications including robot grasping and placing, augmented reality, and the first object-level SLAM system.
- Score: 4.989480853499916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The choice of scene representation is crucial in both the shape inference
algorithms it requires and the smart applications it enables. We present
efficient and optimisable multi-class learned object descriptors together with
a novel probabilistic and differential rendering engine, for principled full
object shape inference from one or more RGB-D images. Our framework allows for
accurate and robust 3D object reconstruction which enables multiple
applications including robot grasping and placing, augmented reality, and the
first object-level SLAM system capable of optimising object poses and shapes
jointly with camera trajectory.
Related papers
- SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild [76.21063993398451]
Inverse rendering of an object based on unconstrained image collections is a long-standing challenge in computer vision and graphics.
We show that an implicit shape representation based on a multi-resolution hash encoding enables faster and robust shape reconstruction.
Our method is class-agnostic and works on in-the-wild image collections of objects to produce relightable 3D assets.
arXiv Detail & Related papers (2024-01-18T18:01:19Z) - Towards Scalable Multi-View Reconstruction of Geometry and Materials [27.660389147094715]
We propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes.
The input are high-resolution RGBD images captured by a mobile, hand-held capture system with point lights for active illumination.
arXiv Detail & Related papers (2023-06-06T15:07:39Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - Multi-View Neural Surface Reconstruction with Structured Light [7.709526244898887]
Three-dimensional (3D) object reconstruction based on differentiable rendering (DR) is an active research topic in computer vision.
We introduce active sensing with structured light (SL) into multi-view 3D object reconstruction based on DR to learn the unknown geometry and appearance of arbitrary scenes and camera poses.
Our method realizes high reconstruction accuracy in the textureless region and reduces efforts for camera pose calibration.
arXiv Detail & Related papers (2022-11-22T03:10:46Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - SDFEst: Categorical Pose and Shape Estimation of Objects from RGB-D
using Signed Distance Fields [5.71097144710995]
We present a modular pipeline for pose and shape estimation of objects from RGB-D images.
We integrate a generative shape model with a novel network to enable 6D pose and shape estimation from a single or multiple views.
We demonstrate the benefits of our approach over state-of-the-art methods in several experiments on both synthetic and real data.
arXiv Detail & Related papers (2022-07-11T13:53:50Z) - Supervised Training of Dense Object Nets using Optimal Descriptors for
Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community.
In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs.
We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z) - From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z) - Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z) - MoreFusion: Multi-object Reasoning for 6D Pose Estimation from
Volumetric Fusion [19.034317851914725]
We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision.
Our approach makes 3D object pose proposals from single RGB-D views, accumulates pose estimates and non-parametric occupancy information from multiple views as the camera moves.
We verify the accuracy and robustness of our approach experimentally on 2 object datasets: YCB-Video, and our own challenging Cluttered YCB-Video.
arXiv Detail & Related papers (2020-04-09T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.