Spatial Information Guided Convolution for Real-Time RGBD Semantic
Segmentation
- URL: http://arxiv.org/abs/2004.04534v2
- Date: Fri, 8 Jan 2021 04:24:35 GMT
- Title: Spatial Information Guided Convolution for Real-Time RGBD Semantic
Segmentation
- Authors: Lin-Zhuo Chen, Zheng Lin, Ziqin Wang, Yong-Liang Yang, and Ming-Ming
Cheng
- Abstract summary: We propose Spatial information guided Convolution (S-Conv), which allows efficient RGB feature and 3D spatial information integration.
S-Conv is competent to infer the sampling offset of the convolution kernel guided by the 3D spatial information.
We further embed S-Conv into a semantic segmentation network, called Spatial information Guided convolutional Network (SGNet)
- Score: 79.78416804260668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D spatial information is known to be beneficial to the semantic segmentation
task. Most existing methods take 3D spatial data as an additional input,
leading to a two-stream segmentation network that processes RGB and 3D spatial
information separately. This solution greatly increases the inference time and
severely limits its scope for real-time applications. To solve this problem, we
propose Spatial information guided Convolution (S-Conv), which allows efficient
RGB feature and 3D spatial information integration. S-Conv is competent to
infer the sampling offset of the convolution kernel guided by the 3D spatial
information, helping the convolutional layer adjust the receptive field and
adapt to geometric transformations. S-Conv also incorporates geometric
information into the feature learning process by generating spatially adaptive
convolutional weights. The capability of perceiving geometry is largely
enhanced without much affecting the amount of parameters and computational
cost. We further embed S-Conv into a semantic segmentation network, called
Spatial information Guided convolutional Network (SGNet), resulting in
real-time inference and state-of-the-art performance on NYUDv2 and SUNRGBD
datasets.
Related papers
- ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized
Device Coordinates Space [77.6067460464962]
Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs.
We identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Imbalance in the 3D convolution across different depth levels.
We devise a novel Normalized Device Coordinates scene completion network (NDC-Scene) that directly extends the 2
arXiv Detail & Related papers (2023-09-26T02:09:52Z) - Spatial-information Guided Adaptive Context-aware Network for Efficient
RGB-D Semantic Segmentation [9.198120596225968]
We propose an efficient lightweight encoder-decoder network that reduces the computational parameters and guarantees the robustness of the algorithm.
Experimental results on NYUv2, SUN RGB-D, and Cityscapes datasets show that our method achieves a better trade-off among segmentation accuracy, inference time, and parameters than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-11T09:02:03Z) - Neural Progressive Meshes [54.52990060976026]
We propose a method to transmit 3D meshes with a shared learned generative space.
We learn this space using a subdivision-based encoder-decoder architecture trained in advance on a large collection of surfaces.
We evaluate our method on a diverse set of complex 3D shapes and demonstrate that it outperforms baselines in terms of compression ratio and reconstruction quality.
arXiv Detail & Related papers (2023-08-10T17:58:02Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - The Projection-Enhancement Network (PEN) [3.0464385291578973]
We propose a novel convolutional module which processes sub-sampled 3D data and produces a 2D RGB semantic compression.
We show that with PEN, the learned semantic representation in CellPose encodes depth and greatly improves segmentation performance.
We present PEN as a data-driven solution to form compressed representations of 3D data that improve 2D segmentations from instance segmentation networks.
arXiv Detail & Related papers (2023-01-26T00:07:22Z) - Spatial Pruned Sparse Convolution for Efficient 3D Object Detection [41.62839541489369]
3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.
In this paper, we analyze major components of existing 3D CNNs and find that 3D CNNs ignore the redundancy of data and further amplify it in the down-sampling process, which brings a huge amount of extra and unnecessary computational overhead.
We propose a new convolution operator named spatial pruned sparse convolution (SPS-Conv), which includes two variants, spatial pruned submanifold sparse convolution (SPSS-Conv) and spatial pruned regular sparse convolution (SPRS
arXiv Detail & Related papers (2022-09-28T16:19:06Z) - Data Augmented 3D Semantic Scene Completion with 2D Segmentation Priors [1.0973642726108543]
We present SPAwN, a novel lightweight multimodal 3D deep CNN.
A crucial difficulty in this field is the lack of fully labeled real-world 3D datasets.
We introduce the use of a 3D data augmentation strategy that can be applied to multimodal SSC networks.
arXiv Detail & Related papers (2021-11-26T04:08:34Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.