FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point
Cloud Segmentation
- URL: http://arxiv.org/abs/2103.00738v1
- Date: Mon, 1 Mar 2021 04:08:28 GMT
- Title: FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point
Cloud Segmentation
- Authors: Aoran Xiao, Xiaofei Yang, Shijian Lu, Dayan Guan and Jiaxing Huang
- Abstract summary: Scene understanding based on LiDAR point cloud is an essential task for autonomous cars to drive safely.
Most existing methods simply stack different point attributes/modalities as image channels to increase information capacity.
We design FPS-Net, a convolutional fusion network that exploits the uniqueness and discrepancy among the projected image channels for optimal point cloud segmentation.
- Score: 30.736361776703568
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Scene understanding based on LiDAR point cloud is an essential task for
autonomous cars to drive safely, which often employs spherical projection to
map 3D point cloud into multi-channel 2D images for semantic segmentation. Most
existing methods simply stack different point attributes/modalities (e.g.
coordinates, intensity, depth, etc.) as image channels to increase information
capacity, but ignore distinct characteristics of point attributes in different
image channels. We design FPS-Net, a convolutional fusion network that exploits
the uniqueness and discrepancy among the projected image channels for optimal
point cloud segmentation. FPS-Net adopts an encoder-decoder structure. Instead
of simply stacking multiple channel images as a single input, we group them
into different modalities to first learn modality-specific features separately
and then map the learned features into a common high-dimensional feature space
for pixel-level fusion and learning. Specifically, we design a residual dense
block with multiple receptive fields as a building block in the encoder which
preserves detailed information in each modality and learns hierarchical
modality-specific and fused features effectively. In the FPS-Net decoder, we
use a recurrent convolution block likewise to hierarchically decode fused
features into output space for pixel-level classification. Extensive
experiments conducted on two widely adopted point cloud datasets show that
FPS-Net achieves superior semantic segmentation as compared with
state-of-the-art projection-based methods. In addition, the proposed modality
fusion idea is compatible with typical projection-based methods and can be
incorporated into them with consistent performance improvements.
Related papers
- Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features [62.63481844384229]
Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks.
In this paper, we use a diffusion UNet encoder as a foundation vision encoder and introduce DiffCut, an unsupervised zero-shot segmentation method.
Our work highlights the remarkably accurate semantic knowledge embedded within diffusion UNet encoders that could then serve as foundation vision encoders for downstream tasks.
arXiv Detail & Related papers (2024-06-05T01:32:31Z) - Few-Shot 3D Point Cloud Semantic Segmentation via Stratified
Class-Specific Attention Based Transformer Network [22.9434434107516]
We develop a new multi-layer transformer network for few-shot point cloud semantic segmentation.
Our method achieves the new state-of-the-art performance, with 15% less inference time, over existing few-shot 3D point cloud segmentation models.
arXiv Detail & Related papers (2023-03-28T00:27:54Z) - Multi-Projection Fusion and Refinement Network for Salient Object
Detection in 360{\deg} Omnidirectional Image [141.10227079090419]
We propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360deg omnidirectional image.
MPFR-Net uses the equirectangular projection image and four corresponding cube-unfolding images as inputs.
Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-12-23T14:50:40Z) - Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object
Detection [16.198358858773258]
Multi-modal 3D object detection has been an active research topic in autonomous driving.
It is non-trivial to explore the cross-modal feature fusion between sparse 3D points and dense 2D pixels.
Recent approaches either fuse the image features with the point cloud features that are projected onto the 2D image plane or combine the sparse point cloud with dense image pixels.
arXiv Detail & Related papers (2022-10-18T06:15:56Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - Action Keypoint Network for Efficient Video Recognition [63.48422805355741]
This paper proposes to integrate temporal and spatial selection into an Action Keypoint Network (AK-Net)
AK-Net selects some informative points scattered in arbitrary-shaped regions as a set of action keypoints and then transforms the video recognition into point cloud classification.
Experimental results show that AK-Net can consistently improve the efficiency and performance of baseline methods on several video recognition benchmarks.
arXiv Detail & Related papers (2022-01-17T09:35:34Z) - Multi-Scale Feature Fusion: Learning Better Semantic Segmentation for
Road Pothole Detection [9.356003255288417]
This paper presents a novel pothole detection approach based on single-modal semantic segmentation.
It first extracts visual features from input images using a convolutional neural network.
A channel attention module then reweighs the channel features to enhance the consistency of different feature maps.
arXiv Detail & Related papers (2021-12-24T15:07:47Z) - Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding [80.04281842702294]
We introduce the concept of the multi-view point cloud (Voint cloud) representing each 3D point as a set of features extracted from several view-points.
This novel 3D Voint cloud representation combines the compactness of 3D point cloud representation with the natural view-awareness of multi-view representation.
We deploy a Voint neural network (VointNet) with a theoretically established functional form to learn representations in the Voint space.
arXiv Detail & Related papers (2021-11-30T13:08:19Z) - Sharp U-Net: Depthwise Convolutional Network for Biomedical Image
Segmentation [1.1501261942096426]
U-Net has proven to be effective in biomedical image segmentation.
We propose a simple, yet effective end-to-end depthwise encoder-decoder fully convolutional network architecture, called Sharp U-Net.
Our experiments show that the proposed Sharp U-Net model consistently outperforms or matches the recent state-of-the-art baselines in both binary and multi-class segmentation tasks.
arXiv Detail & Related papers (2021-07-26T20:27:25Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z) - Evidential fully convolutional network for semantic segmentation [6.230751621285322]
We propose a hybrid architecture composed of a fully convolutional network (FCN) and a Dempster-Shafer layer for image semantic segmentation.
Experiments show that the proposed combination improves the accuracy and calibration of semantic segmentation by assigning confusing pixels to multi-class sets.
arXiv Detail & Related papers (2021-03-25T01:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.