PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation
- URL: http://arxiv.org/abs/2412.14821v1
- Date: Thu, 19 Dec 2024 13:12:15 GMT
- Title: PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation
- Authors: Shoumeng Qiu, Xinrun Li, XiangYang Xue, Jian Pu,
- Abstract summary: This paper challenges the prevailing notion that multiview fusion is essential for achieving high performance.
We demonstrate that significant gains can be realized by directly fusing Polar and Cartesian partitioning strategies.
Our approach facilitates dense feature fusion, preserving richer contextual information compared to sparse point-based alternatives.
- Score: 42.879223792782334
- License:
- Abstract: Although multiview fusion has demonstrated potential in LiDAR segmentation, its dependence on computationally intensive point-based interactions, arising from the lack of fixed correspondences between views such as range view and Bird's-Eye View (BEV), hinders its practical deployment. This paper challenges the prevailing notion that multiview fusion is essential for achieving high performance. We demonstrate that significant gains can be realized by directly fusing Polar and Cartesian partitioning strategies within the BEV space. Our proposed BEV-only segmentation model leverages the inherent fixed grid correspondences between these partitioning schemes, enabling a fusion process that is orders of magnitude faster (170$\times$ speedup) than conventional point-based methods. Furthermore, our approach facilitates dense feature fusion, preserving richer contextual information compared to sparse point-based alternatives. To enhance scene understanding while maintaining inference efficiency, we also introduce a hybrid Transformer-CNN architecture. Extensive evaluation on the SemanticKITTI and nuScenes datasets provides compelling evidence that our method outperforms previous multiview fusion approaches in terms of both performance and inference speed, highlighting the potential of BEV-based fusion for LiDAR segmentation. Code is available at \url{https://github.com/skyshoumeng/PC-BEV.}
Related papers
- LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation [10.434754671492723]
We propose LSSInst, a two-stage object detector incorporating BEV and instance representations in tandem.
The proposed detector exploits fine-grained pixel-level features that can be flexibly integrated into existing LSS-based BEV networks.
Our proposed framework is of excellent generalization ability and performance, which boosts the performances of modern LSS-based BEV perception methods without bells and whistles.
arXiv Detail & Related papers (2024-11-09T13:03:54Z) - BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point
Clouds [73.40883276013373]
We present a surprisingly simple and efficient method for self-supervision of 3D backbone on automotive Lidar point clouds.
We design a contrastive loss between features of Lidar scans captured in the same scene.
Resulting cell-level representations offer a good trade-off between the point-level representations exploited in PointContrast and segment-level representations exploited in TARL.
arXiv Detail & Related papers (2023-10-26T10:02:33Z) - X-Align++: cross-modal cross-view alignment for Bird's-eye-view
segmentation [44.58686493878629]
X-Align is a novel end-to-end cross-modal and cross-view learning framework for BEV segmentation.
X-Align significantly outperforms the state-of-the-art by 3 absolute mIoU points on nuScenes and KITTI-360 datasets.
arXiv Detail & Related papers (2023-06-06T15:52:55Z) - A Cross-Scale Hierarchical Transformer with Correspondence-Augmented
Attention for inferring Bird's-Eye-View Semantic Segmentation [13.013635162859108]
Inferring BEV semantic segmentation conditioned on multi-camera-view images is a popular scheme in the community as cheap devices and real-time processing.
We propose a novel cross-scale hierarchical Transformer with correspondence-augmented attention for semantic segmentation inferring.
Our method has state-of-the-art performance in inferring BEV semantic segmentation conditioned on multi-camera-view images.
arXiv Detail & Related papers (2023-04-07T13:52:47Z) - X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View
Segmentation [44.95630790801856]
X-Align is a novel end-to-end cross-modal and cross-view learning framework for BEV segmentation.
X-Align significantly outperforms the state-of-the-art by 3 absolute mIoU points on nuScenes.
arXiv Detail & Related papers (2022-10-13T06:42:46Z) - Late Fusion Multi-view Clustering via Global and Local Alignment
Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance.
Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering.
We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z) - UniFusion: Unified Multi-view Fusion Transformer for Spatial-Temporal
Representation in Bird's-Eye-View [20.169308746548587]
We propose a new method that unifies both spatial and temporal fusion and merges them into a unified mathematical formulation.
With the proposed unified spatial-temporal fusion, our method could support long-range fusion.
Our method gains the state-of-the-art performance in the map segmentation task.
arXiv Detail & Related papers (2022-07-18T11:59:10Z) - GitNet: Geometric Prior-based Transformation for Birds-Eye-View
Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving.
We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.