Multi view stereo with semantic priors
- URL: http://arxiv.org/abs/2007.02295v1
- Date: Sun, 5 Jul 2020 11:30:29 GMT
- Title: Multi view stereo with semantic priors
- Authors: Elisavet Konstantina Stathopoulou, Fabio Remondino
- Abstract summary: We aim to support the standard dense 3D reconstruction of scenes as implemented in the open source library OpenMVS by using semantic priors.
We impose extra semantic constraints in order to remove possible errors and selectively obtain segmented point clouds per label.
- Score: 3.756550107432323
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Patch-based stereo is nowadays a commonly used image-based technique for
dense 3D reconstruction in large scale multi-view applications. The typical
steps of such a pipeline can be summarized in stereo pair selection, depth map
computation, depth map refinement and, finally, fusion in order to generate a
complete and accurate representation of the scene in 3D. In this study, we aim
to support the standard dense 3D reconstruction of scenes as implemented in the
open source library OpenMVS by using semantic priors. To this end, during the
depth map fusion step, along with the depth consistency check between depth
maps of neighbouring views referring to the same part of the 3D scene, we
impose extra semantic constraints in order to remove possible errors and
selectively obtain segmented point clouds per label, boosting automation
towards this direction. I n order to reassure semantic coherence between
neighbouring views, additional semantic criterions can be considered, aiming to
elim inate mismatches of pixels belonging in different classes.
Related papers
- Rethinking Disparity: A Depth Range Free Multi-View Stereo Based on
Disparity [17.98608948955211]
Existing learning-based multi-view stereo (MVS) methods rely on the depth range to build the 3D cost volume.
We propose a disparity-based MVS method based on the epipolar disparity flow (E-flow), called DispMVS.
We show that DispMVS is not sensitive to the depth range and achieves state-of-the-art results with lower GPU memory.
arXiv Detail & Related papers (2022-11-30T11:05:02Z) - Revisiting PatchMatch Multi-View Stereo for Urban 3D Reconstruction [1.1011268090482573]
A complete pipeline for image-based 3D reconstruction of urban scenarios is proposed, based on PatchMatch Multi-View Stereo (MVS)
The proposed approach is carefully evaluated against both classical MVS algorithms and monocular depth networks on the KITTI dataset.
arXiv Detail & Related papers (2022-07-18T08:45:54Z) - MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection [61.89277940084792]
We introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR.
We formulate 3D object candidates as learnable queries and propose a depth-guided decoder to conduct object-scene depth interactions.
On KITTI benchmark with monocular images as input, MonoDETR achieves state-of-the-art performance and requires no extra dense depth annotations.
arXiv Detail & Related papers (2022-03-24T19:28:54Z) - TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view
Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework.
For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments.
TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z) - Semantic Dense Reconstruction with Consistent Scene Segments [33.0310121044956]
A method for dense semantic 3D scene reconstruction from an RGB-D sequence is proposed to solve high-level scene understanding tasks.
First, each RGB-D pair is consistently segmented into 2D semantic maps based on a camera tracking backbone.
A dense 3D mesh model of an unknown environment is incrementally generated from the input RGB-D sequence.
arXiv Detail & Related papers (2021-09-30T03:01:17Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - A Real-Time Online Learning Framework for Joint 3D Reconstruction and
Semantic Segmentation of Indoor Scenes [87.74952229507096]
This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label.
Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed neural network learns to fuse the depth over frames with suitable semantic labels in the scene space.
arXiv Detail & Related papers (2021-08-11T14:29:01Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images.
We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.