DSP-SLAM: Object Oriented SLAM with Deep Shape Priors
- URL: http://arxiv.org/abs/2108.09481v1
- Date: Sat, 21 Aug 2021 10:00:12 GMT
- Title: DSP-SLAM: Object Oriented SLAM with Deep Shape Priors
- Authors: Jingwen Wang, Martin R\"unz, Lourdes Agapito
- Abstract summary: We propose an object-oriented SLAM system that builds a rich and accurate joint map of dense 3D models for foreground objects.
DSP-SLAM takes as input the 3D point cloud reconstructed by a feature-based SLAM system.
Our evaluation shows improvements in object pose and shape reconstruction with respect to recent deep prior-based reconstruction methods.
- Score: 16.867669408751507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose DSP-SLAM, an object-oriented SLAM system that builds a rich and
accurate joint map of dense 3D models for foreground objects, and sparse
landmark points to represent the background. DSP-SLAM takes as input the 3D
point cloud reconstructed by a feature-based SLAM system and equips it with the
ability to enhance its sparse map with dense reconstructions of detected
objects. Objects are detected via semantic instance segmentation, and their
shape and pose is estimated using category-specific deep shape embeddings as
priors, via a novel second order optimization. Our object-aware bundle
adjustment builds a pose-graph to jointly optimize camera poses, object
locations and feature points. DSP-SLAM can operate at 10 frames per second on 3
different input modalities: monocular, stereo, or stereo+LiDAR. We demonstrate
DSP-SLAM operating at almost frame rate on monocular-RGB sequences from the
Friburg and Redwood-OS datasets, and on stereo+LiDAR sequences on the KITTI
odometry dataset showing that it achieves high-quality full object
reconstructions, even from partial observations, while maintaining a consistent
global map. Our evaluation shows improvements in object pose and shape
reconstruction with respect to recent deep prior-based reconstruction methods
and reductions in camera tracking drift on the KITTI dataset.
Related papers
- MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic
SLAM [0.0]
TwistSLAM++ is a semantic, dynamic, SLAM system that fuses stereo images and LiDAR information.
We show on classical benchmarks that this fusion approach based on multimodal information improves the accuracy of object tracking.
arXiv Detail & Related papers (2022-09-16T12:28:21Z) - Visual-Inertial Multi-Instance Dynamic SLAM with Object-level
Relocalisation [14.302118093865849]
We present a tightly-coupled visual-inertial object-level multi-instance dynamic SLAM system.
It can robustly optimise for the camera pose, velocity, IMU biases and build a dense 3D reconstruction object-level map of the environment.
arXiv Detail & Related papers (2022-08-08T17:13:24Z) - RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds.
We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays.
Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z) - Joint stereo 3D object detection and implicit surface reconstruction [39.30458073540617]
We present a new learning-based framework S-3D-RCNN that can recover accurate object orientation in SO(3) and simultaneously predict implicit rigid shapes from stereo RGB images.
For orientation estimation, in contrast to previous studies that map local appearance to observation angles, we propose a progressive approach by extracting meaningful Intermediate Geometrical Representations (IGRs)
This approach features a deep model that transforms perceived intensities from one or two views to object part coordinates to achieve direct egocentric object orientation estimation in the camera coordinate system.
To further achieve finer description inside 3D bounding boxes, we investigate the implicit shape estimation problem from stereo images
arXiv Detail & Related papers (2021-11-25T05:52:30Z) - TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view
Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework.
For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments.
TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z) - ODAM: Object Detection, Association, and Mapping using Posed RGB Video [36.16010611723447]
We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos.
The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN)
arXiv Detail & Related papers (2021-08-23T13:28:10Z) - Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images
with Virtual Depth [64.29043589521308]
We propose a rendering module to augment the training data by synthesizing images with virtual-depths.
The rendering module takes as input the RGB image and its corresponding sparse depth image, outputs a variety of photo-realistic synthetic images.
Besides, we introduce an auxiliary module to improve the detection model by jointly optimizing it through a depth estimation task.
arXiv Detail & Related papers (2021-07-28T11:00:47Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction [19.535169371240073]
We introduce RfD-Net that jointly detects and reconstructs dense object surfaces directly from point clouds.
We decouple the instance reconstruction into global object localization and local shape prediction.
Our approach consistently outperforms the state-of-the-arts and improves over 11 of mesh IoU in object reconstruction.
arXiv Detail & Related papers (2020-11-30T12:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.