RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR
Point Cloud Segmentation
- URL: http://arxiv.org/abs/2103.12978v1
- Date: Wed, 24 Mar 2021 04:24:12 GMT
- Title: RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR
Point Cloud Segmentation
- Authors: Jianyun Xu, Ruixiang Zhang, Jian Dou, Yushi Zhu, Jie Sun, Shiliang Pu
- Abstract summary: We propose a novel range-point-voxel fusion network, namely RPVNet.
In this network, we devise a deep fusion framework with multiple and mutual information interactions among these three views.
By leveraging this efficient interaction and relatively lower voxel resolution, our method is also proved to be more efficient.
- Score: 28.494690309193068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point clouds can be represented in many forms (views), typically, point-based
sets, voxel-based cells or range-based images(i.e., panoramic view). The
point-based view is geometrically accurate, but it is disordered, which makes
it difficult to find local neighbors efficiently. The voxel-based view is
regular, but sparse, and computation grows cubically when voxel resolution
increases. The range-based view is regular and generally dense, however
spherical projection makes physical dimensions distorted. Both voxel- and
range-based views suffer from quantization loss, especially for voxels when
facing large-scale scenes. In order to utilize different view's advantages and
alleviate their own shortcomings in fine-grained segmentation task, we propose
a novel range-point-voxel fusion network, namely RPVNet. In this network, we
devise a deep fusion framework with multiple and mutual information
interactions among these three views and propose a gated fusion module (termed
as GFM), which can adaptively merge the three features based on concurrent
inputs. Moreover, the proposed RPV interaction mechanism is highly efficient,
and we summarize it into a more general formulation. By leveraging this
efficient interaction and relatively lower voxel resolution, our method is also
proved to be more efficient. Finally, we evaluated the proposed model on two
large-scale datasets, i.e., SemanticKITTI and nuScenes, and it shows
state-of-the-art performance on both of them. Note that, our method currently
ranks 1st on SemanticKITTI leaderboard without any extra tricks.
Related papers
- Self-Supervised Scene Flow Estimation with Point-Voxel Fusion and Surface Representation [30.355128117680444]
Scene flow estimation aims to generate the 3D motion field of points between two consecutive frames of point clouds.
Existing point-based methods ignore the irregularity of point clouds and have difficulty capturing long-range dependencies.
We propose a point-voxel fusion method, where we utilize a voxel branch based on sparse grid attention and the shifted window strategy to capture long-range dependencies.
arXiv Detail & Related papers (2024-10-17T09:05:15Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D
Object Detection [49.324070632356296]
We develop a sparse voxel-pillar encoder that encodes point clouds into voxel and pillar features through 3D and 2D sparse convolutions respectively.
Our efficient, fully sparse method can be seamlessly integrated into both dense and sparse detectors.
arXiv Detail & Related papers (2023-04-06T05:00:58Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation [91.15865862160088]
We introduce a geometric flow network (GFNet) to explore the geometric correspondence between different views in an align-before-fuse manner.
Specifically, we devise a novel geometric flow module (GFM) to bidirectionally align and propagate the complementary information across different views.
arXiv Detail & Related papers (2022-07-06T11:48:08Z) - Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic
Segmentation via Clustering Pseudo Heatmap [9.770808277353128]
We propose a fast and high-performance LiDAR-based framework, referred to as Panoptic-PHNet.
We introduce a clustering pseudo heatmap as a new paradigm, which, followed by a center grouping module, yields instance centers for efficient clustering.
For backbone design, we fuse the fine-grained voxel features and the 2D Bird's Eye View (BEV) features with different receptive fields to utilize both detailed and global information.
arXiv Detail & Related papers (2022-05-14T08:16:13Z) - Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from
Point Clouds [16.69887974230884]
Transformer has demonstrated promising performance in many 2D vision tasks.
It is cumbersome to compute the self-attention on large-scale point cloud data because point cloud is a long sequence and unevenly distributed in 3D space.
Existing methods usually compute self-attention locally by grouping the points into clusters of the same size, or perform convolutional self-attention on a discretized representation.
We propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation.
arXiv Detail & Related papers (2022-03-19T12:31:46Z) - UniFuse: Unidirectional Fusion for 360$^{\circ}$ Panorama Depth
Estimation [11.680475784102308]
This paper introduces a new framework to fuse features from the two projections, unidirectionally feeding the cubemap features to the equirectangular features only at the decoding stage.
Experiments verify the effectiveness of our proposed fusion strategy and module, and our model achieves state-of-the-art performance on four popular datasets.
arXiv Detail & Related papers (2021-02-06T10:01:09Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.