FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point
Cloud Segmentation
- URL: http://arxiv.org/abs/2103.00738v1
- Date: Mon, 1 Mar 2021 04:08:28 GMT
- Title: FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point
Cloud Segmentation
- Authors: Aoran Xiao, Xiaofei Yang, Shijian Lu, Dayan Guan and Jiaxing Huang
- Abstract summary: Scene understanding based on LiDAR point cloud is an essential task for autonomous cars to drive safely.
Most existing methods simply stack different point attributes/modalities as image channels to increase information capacity.
We design FPS-Net, a convolutional fusion network that exploits the uniqueness and discrepancy among the projected image channels for optimal point cloud segmentation.
- Score: 30.736361776703568
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Scene understanding based on LiDAR point cloud is an essential task for
autonomous cars to drive safely, which often employs spherical projection to
map 3D point cloud into multi-channel 2D images for semantic segmentation. Most
existing methods simply stack different point attributes/modalities (e.g.
coordinates, intensity, depth, etc.) as image channels to increase information
capacity, but ignore distinct characteristics of point attributes in different
image channels. We design FPS-Net, a convolutional fusion network that exploits
the uniqueness and discrepancy among the projected image channels for optimal
point cloud segmentation. FPS-Net adopts an encoder-decoder structure. Instead
of simply stacking multiple channel images as a single input, we group them
into different modalities to first learn modality-specific features separately
and then map the learned features into a common high-dimensional feature space
for pixel-level fusion and learning. Specifically, we design a residual dense
block with multiple receptive fields as a building block in the encoder which
preserves detailed information in each modality and learns hierarchical
modality-specific and fused features effectively. In the FPS-Net decoder, we
use a recurrent convolution block likewise to hierarchically decode fused
features into output space for pixel-level classification. Extensive
experiments conducted on two widely adopted point cloud datasets show that
FPS-Net achieves superior semantic segmentation as compared with
state-of-the-art projection-based methods. In addition, the proposed modality
fusion idea is compatible with typical projection-based methods and can be
incorporated into them with consistent performance improvements.
Related papers
- Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching [2.400446821380503]
We introduce an efficient framework to learn descriptors for both RGB images and point clouds.
It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy.
A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision.
arXiv Detail & Related papers (2024-10-08T18:31:41Z) - DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut [62.63481844384229]
Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks.
In this paper, we use a diffusion UNet encoder as a foundation vision encoder and introduce DiffCut, an unsupervised zero-shot segmentation method.
Our work highlights the remarkably accurate semantic knowledge embedded within diffusion UNet encoders that could then serve as foundation vision encoders for downstream tasks.
arXiv Detail & Related papers (2024-06-05T01:32:31Z) - Few-Shot 3D Point Cloud Semantic Segmentation via Stratified
Class-Specific Attention Based Transformer Network [22.9434434107516]
We develop a new multi-layer transformer network for few-shot point cloud semantic segmentation.
Our method achieves the new state-of-the-art performance, with 15% less inference time, over existing few-shot 3D point cloud segmentation models.
arXiv Detail & Related papers (2023-03-28T00:27:54Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - Action Keypoint Network for Efficient Video Recognition [63.48422805355741]
This paper proposes to integrate temporal and spatial selection into an Action Keypoint Network (AK-Net)
AK-Net selects some informative points scattered in arbitrary-shaped regions as a set of action keypoints and then transforms the video recognition into point cloud classification.
Experimental results show that AK-Net can consistently improve the efficiency and performance of baseline methods on several video recognition benchmarks.
arXiv Detail & Related papers (2022-01-17T09:35:34Z) - Multi-Scale Feature Fusion: Learning Better Semantic Segmentation for
Road Pothole Detection [9.356003255288417]
This paper presents a novel pothole detection approach based on single-modal semantic segmentation.
It first extracts visual features from input images using a convolutional neural network.
A channel attention module then reweighs the channel features to enhance the consistency of different feature maps.
arXiv Detail & Related papers (2021-12-24T15:07:47Z) - Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding [80.04281842702294]
We introduce the concept of the multi-view point cloud (Voint cloud) representing each 3D point as a set of features extracted from several view-points.
This novel 3D Voint cloud representation combines the compactness of 3D point cloud representation with the natural view-awareness of multi-view representation.
We deploy a Voint neural network (VointNet) with a theoretically established functional form to learn representations in the Voint space.
arXiv Detail & Related papers (2021-11-30T13:08:19Z) - Sharp U-Net: Depthwise Convolutional Network for Biomedical Image
Segmentation [1.1501261942096426]
U-Net has proven to be effective in biomedical image segmentation.
We propose a simple, yet effective end-to-end depthwise encoder-decoder fully convolutional network architecture, called Sharp U-Net.
Our experiments show that the proposed Sharp U-Net model consistently outperforms or matches the recent state-of-the-art baselines in both binary and multi-class segmentation tasks.
arXiv Detail & Related papers (2021-07-26T20:27:25Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Evidential fully convolutional network for semantic segmentation [6.230751621285322]
We propose a hybrid architecture composed of a fully convolutional network (FCN) and a Dempster-Shafer layer for image semantic segmentation.
Experiments show that the proposed combination improves the accuracy and calibration of semantic segmentation by assigning confusing pixels to multi-class sets.
arXiv Detail & Related papers (2021-03-25T01:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.