FroDO: From Detections to 3D Objects
- URL: http://arxiv.org/abs/2005.05125v1
- Date: Mon, 11 May 2020 14:08:29 GMT
- Title: FroDO: From Detections to 3D Objects
- Authors: Kejie Li, Martin R\"unz, Meng Tang, Lingni Ma, Chen Kong, Tanner
Schmidt, Ian Reid, Lourdes Agapito, Julian Straub, Steven Lovegrove, Richard
Newcombe
- Abstract summary: FroDO is a method for accurate 3D reconstruction of object instances from RGB video.
It infers object location, pose and shape in a coarse-to-fine manner.
We evaluate on real-world datasets, including Pix3D, Redwood-OS, and ScanNet.
- Score: 29.10716046157072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object-oriented maps are important for scene understanding since they jointly
capture geometry and semantics, allow individual instantiation and meaningful
reasoning about objects. We introduce FroDO, a method for accurate 3D
reconstruction of object instances from RGB video that infers object location,
pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object
shapes in a novel learnt space that allows seamless switching between sparse
point cloud and dense DeepSDF decoding. Given an input sequence of localized
RGB frames, FroDO first aggregates 2D detections to instantiate a
category-aware 3D bounding box per object. A shape code is regressed using an
encoder network before optimizing shape and pose further under the learnt shape
priors using sparse and dense shape representations. The optimization uses
multi-view geometric, photometric and silhouette losses. We evaluate on
real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view,
multi-view, and multi-object reconstruction.
Related papers
- BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z) - Learning Geometry-Disentangled Representation for Complementary
Understanding of 3D Object Point Cloud [50.56461318879761]
We propose Geometry-Disentangled Attention Network (GDANet) for 3D image processing.
GDANet disentangles point clouds into contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components.
Experiments on 3D object classification and segmentation benchmarks demonstrate that GDANet achieves the state-of-the-arts with fewer parameters.
arXiv Detail & Related papers (2020-12-20T13:35:00Z) - MOLTR: Multiple Object Localisation, Tracking, and Reconstruction from
Monocular RGB Videos [30.541606989348377]
MOLTR is a solution to object-centric mapping using only monocular image sequences and camera poses.
It is able to localise, track, and reconstruct multiple objects in an online fashion when an RGB camera captures a video of the surrounding.
We evaluate localisation, tracking, and reconstruction on benchmarking datasets for indoor and outdoor scenes.
arXiv Detail & Related papers (2020-12-09T23:15:08Z) - An Overview Of 3D Object Detection [21.159668390764832]
We propose a framework that uses both RGB and point cloud data to perform multiclass object recognition.
We use the recently released nuScenes dataset---a large-scale dataset contains many data formats---to training and evaluate our proposed architecture.
arXiv Detail & Related papers (2020-10-29T14:04:50Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - CoReNet: Coherent 3D scene reconstruction from a single RGB image [43.74240268086773]
We build on advances in deep learning to reconstruct the shape of a single object given only one RBG image as input.
We propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that enables building translation equivariant models; and (3) a reconstruction loss tailored to capture overall object geometry.
We reconstruct all objects jointly in one pass, producing a coherent reconstruction, where all objects live in a single consistent 3D coordinate frame relative to the camera and they do not intersect in 3D space.
arXiv Detail & Related papers (2020-04-27T17:53:07Z) - Local Implicit Grid Representations for 3D Scenes [24.331110387905962]
We introduce Local Implicit Grid Representations, a new 3D shape representation designed for scalability and generality.
We train an autoencoder to learn an embedding of local crops of 3D shapes at that size.
Then, we use the decoder as a component in a shape optimization that solves for a set of latent codes on a regular grid of overlapping crops.
arXiv Detail & Related papers (2020-03-19T18:58:13Z) - Implicit Functions in Feature Space for 3D Shape Reconstruction and
Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data.
IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.