Depth Based Semantic Scene Completion with Position Importance Aware
Loss
- URL: http://arxiv.org/abs/2001.10709v2
- Date: Thu, 30 Jan 2020 20:16:34 GMT
- Title: Depth Based Semantic Scene Completion with Position Importance Aware
Loss
- Authors: Yu Liu, Jie Li, Xia Yuan, Chunxia Zhao, Roland Siegwart, Ian Reid,
Cesar Cadena
- Abstract summary: PALNet is a novel hybrid network for semantic scene completion.
It extracts both 2D and 3D features from multi-stages using fine-grained depth information.
It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
- Score: 52.06051681324545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic Scene Completion (SSC) refers to the task of inferring the 3D
semantic segmentation of a scene while simultaneously completing the 3D shapes.
We propose PALNet, a novel hybrid network for SSC based on single depth. PALNet
utilizes a two-stream network to extract both 2D and 3D features from
multi-stages using fine-grained depth information to efficiently captures the
context, as well as the geometric cues of the scene. Current methods for SSC
treat all parts of the scene equally causing unnecessary attention to the
interior of objects. To address this problem, we propose Position Aware
Loss(PA-Loss) which is position importance aware while training the network.
Specifically, PA-Loss considers Local Geometric Anisotropy to determine the
importance of different positions within the scene. It is beneficial for
recovering key details like the boundaries of objects and the corners of the
scene. Comprehensive experiments on two benchmark datasets demonstrate the
effectiveness of the proposed method and its superior performance. Models and
Video demo can be found at: https://github.com/UniLauX/PALNet.
Related papers
- Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement [32.335953514942474]
This paper proposes to jointly learn the scene representation along with a 3D dense feature field and a 2D feature extractor.
We learn the underlying geometry of the scene with an implicit field through volumetric rendering and design our feature field to leverage intermediate geometric information encoded in the implicit field.
Visual localization is then achieved by aligning the image-based features and the rendered volumetric features.
arXiv Detail & Related papers (2024-06-12T17:51:53Z) - SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net [18.342569823885864]
SLCF-Net is a novel approach for the Semantic Scene Completion task that sequentially fuses LiDAR and camera data.
It jointly estimates missing geometry and semantics in a scene from sequences of RGB images and sparse LiDAR measurements.
It excels in all SSC metrics and shows great temporal consistency.
arXiv Detail & Related papers (2024-03-13T18:12:53Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images.
Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.
The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - VS-Net: Voting with Segmentation for Visual Localization [72.8165619061249]
We propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks.
Our proposed VS-Net is extensively tested on multiple public benchmarks and can outperform state-of-the-art visual localization methods.
arXiv Detail & Related papers (2021-05-23T08:44:11Z) - Semantic Scene Completion using Local Deep Implicit Functions on LiDAR
Data [4.355440821669468]
We propose a scene segmentation network based on local Deep Implicit Functions as a novel learning-based method for scene completion.
We show that this continuous representation is suitable to encode geometric and semantic properties of extensive outdoor scenes without the need for spatial discretization.
Our experiments verify that our method generates a powerful representation that can be decoded into a dense 3D description of a given scene.
arXiv Detail & Related papers (2020-11-18T07:39:13Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z) - DELTAS: Depth Estimation by Learning Triangulation And densification of
Sparse points [14.254472131009653]
Multi-view stereo (MVS) is the golden mean between the accuracy of active depth sensing and the practicality of monocular depth estimation.
Cost volume based approaches employing 3D convolutional neural networks (CNNs) have considerably improved the accuracy of MVS systems.
We propose an efficient depth estimation approach by first (a) detecting and evaluating descriptors for interest points, then (b) learning to match and triangulate a small set of interest points, and finally (c) densifying this sparse set of 3D points using CNNs.
arXiv Detail & Related papers (2020-03-19T17:56:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.