Semantic Dense Reconstruction with Consistent Scene Segments
- URL: http://arxiv.org/abs/2109.14821v1
- Date: Thu, 30 Sep 2021 03:01:17 GMT
- Title: Semantic Dense Reconstruction with Consistent Scene Segments
- Authors: Yingcai Wan, Yanyan Li, Yingxuan You, Cheng Guo, Lijin Fang and
Federico Tombari
- Abstract summary: A method for dense semantic 3D scene reconstruction from an RGB-D sequence is proposed to solve high-level scene understanding tasks.
First, each RGB-D pair is consistently segmented into 2D semantic maps based on a camera tracking backbone.
A dense 3D mesh model of an unknown environment is incrementally generated from the input RGB-D sequence.
- Score: 33.0310121044956
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, a method for dense semantic 3D scene reconstruction from an
RGB-D sequence is proposed to solve high-level scene understanding tasks.
First, each RGB-D pair is consistently segmented into 2D semantic maps based on
a camera tracking backbone that propagates objects' labels with high
probabilities from full scans to corresponding ones of partial views. Then a
dense 3D mesh model of an unknown environment is incrementally generated from
the input RGB-D sequence. Benefiting from 2D consistent semantic segments and
the 3D model, a novel semantic projection block (SP-Block) is proposed to
extract deep feature volumes from 2D segments of different views. Moreover, the
semantic volumes are fused into deep volumes from a point cloud encoder to make
the final semantic segmentation. Extensive experimental evaluations on public
datasets show that our system achieves accurate 3D dense reconstruction and
state-of-the-art semantic prediction performances simultaneously.
Related papers
- ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth
Sampling [75.957103837167]
Reconstructing a 3D shape based on a single sketch image is challenging due to the large domain gap between a sparse, irregular sketch and a regular, dense 3D shape.
Existing works try to employ the global feature extracted from sketch to directly predict the 3D coordinates, but they usually suffer from losing fine details that are not faithful to the input sketch.
arXiv Detail & Related papers (2022-08-14T16:37:51Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - IMENet: Joint 3D Semantic Scene Completion and 2D Semantic Segmentation
through Iterative Mutual Enhancement [12.091735711364239]
We propose an Iterative Mutual Enhancement Network (IMENet) to solve 3D semantic scene completion and 2D semantic segmentation.
IMENet interactively refines the two tasks at the late prediction stage.
Our approach outperforms the state of the art on both 3D semantic scene completion and 2D semantic segmentation.
arXiv Detail & Related papers (2021-06-29T13:34:20Z) - A Novel 3D-UNet Deep Learning Framework Based on High-Dimensional
Bilateral Grid for Edge Consistent Single Image Depth Estimation [0.45880283710344055]
Bilateral Grid based 3D convolutional neural network, dubbed as 3DBG-UNet, parameterizes high dimensional feature space by encoding compact 3D bilateral grids with UNets.
Another novel 3DBGES-UNet model is introduced that integrate 3DBG-UNet for inferring an accurate depth map given a single color view.
arXiv Detail & Related papers (2021-05-21T04:53:14Z) - Attention-based Multi-modal Fusion Network for Semantic Scene Completion [35.93265545962268]
This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task.
Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously.
It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks.
arXiv Detail & Related papers (2020-03-31T02:00:03Z) - Atlas: End-to-End 3D Scene Reconstruction from Posed Images [13.154808583020229]
We present an end-to-end 3D reconstruction method for a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images.
A 2D CNN extracts features from each image independently which are then back-projected and accumulated into a voxel volume.
A 3D CNN refines the accumulated features and predicts the TSDF values.
arXiv Detail & Related papers (2020-03-23T17:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.