Related papers: RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation

RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation

URL: http://arxiv.org/abs/2407.10159v3
Date: Fri, 13 Sep 2024 19:24:17 GMT
Title: RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation
Authors: Li Li, Hubert P. H. Shum, Toby P. Breckon,
Abstract summary: We introduce Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture. RAPiD features exhibit rigid transformation invariance and effectively adapt to variations in point density. We propose a double-nested autoencoder structure with a novel class-aware embedding objective to encode high-dimensional features into manageable voxel-wise embeddings.
Score: 22.877384781595556
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: 3D point clouds play a pivotal role in outdoor scene perception, especially in the context of autonomous driving. Recent advancements in 3D LiDAR segmentation often focus intensely on the spatial positioning and distribution of points for accurate segmentation. However, these methods, while robust in variable conditions, encounter challenges due to sole reliance on coordinates and point intensity, leading to poor isometric invariance and suboptimal segmentation. To tackle this challenge, our work introduces Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture. Our RAPiD features exhibit rigid transformation invariance and effectively adapt to variations in point density, with a design focus on capturing the localized geometry of neighboring structures. They utilize inherent LiDAR isotropic radiation and semantic categorization for enhanced local representation and computational efficiency, while incorporating a 4D distance metric that integrates geometric and surface material reflectivity for improved semantic segmentation. To effectively embed high-dimensional RAPiD features, we propose a double-nested autoencoder structure with a novel class-aware embedding objective to encode high-dimensional features into manageable voxel-wise embeddings. Additionally, we propose RAPiD-Seg which incorporates a channel-wise attention fusion and two effective RAPiD-Seg variants, further optimizing the embedding for enhanced performance and generalization. Our method outperforms contemporary LiDAR segmentation work in terms of mIoU on SemanticKITTI (76.1) and nuScenes (83.6) datasets.

Related papers

SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs [21.891285551179365]
We introduce Spherical Coordinate-based Positional Embedding (SoPE)<n>Our method maps point-cloud token indices into a 3D spherical coordinate space, enabling unified modeling of spatial locations and directional angles.<n>This formulation preserves the inherent geometric structure of point-cloud data, enhances spatial awareness, and yields more consistent and expressive geometric representations for multimodal learning.
arXiv Detail & Related papers (2026-02-26T07:42:15Z)
RSGround-R1: Rethinking Remote Sensing Visual Grounding through Spatial Reasoning [61.84363374647606]
Remote Sensing Visual Grounding (RSVG) aims to localize target objects in large-scale aerial imagery based on natural language descriptions.<n>These descriptions often rely heavily on positional cues, posing unique challenges for Multimodal Large Language Models (MLLMs) in spatial reasoning.<n>We propose a reasoning-guided, position-aware post-training framework, dubbed textbfRSGround-R1, to progressively enhance spatial understanding.
arXiv Detail & Related papers (2026-01-29T12:35:57Z)
RangeSAM: Leveraging Visual Foundation Models for Range-View repesented LiDAR segmentation [6.513648249086729]
We present the first range-view framework that adapts SAM2 to 3D segmentation, coupling efficient 2D feature extraction with standard projection/back-projection to operate on point clouds.<n>Our approach achieves competitive performance on Semantic KITTI while benefiting from the speed, scalability, and deployment simplicity of 2D-centric pipelines.
arXiv Detail & Related papers (2025-09-19T11:33:10Z)
Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition [4.196626042312499]
We propose a novel framework that redefines 3D place recognition through density-agnostic geometric reasoning.<n>Specifically, we introduce an implicit 3D representation based on elastic points, which is immune to the interference of original scene point cloud density.<n>With the aid of these two types of information, we obtain descriptors that fuse geometric information from both bird's-eye view and 3D segment perspectives.
arXiv Detail & Related papers (2025-06-17T07:04:07Z)
econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians [56.85804719947]
We propose econSG for open-vocabulary semantic segmentation with 3DGS. Our econSG shows state-of-the-art performance on four benchmark datasets compared to the existing methods.
arXiv Detail & Related papers (2025-04-08T13:12:31Z)
Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation [29.621022493810088]
We propose a novel semantic splatting approach based on Gaussian Splatting to achieve efficient and low-latency. Our method projects the RGB attributes and semantic features of point clouds onto the image plane, simultaneously rendering RGB images and semantic segmentation results.
arXiv Detail & Related papers (2024-12-08T15:28:30Z)
On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR [4.606106768645647]
3D LiDAR point cloud data is crucial for scene perception in computer vision, robotics, and autonomous driving. We present DurLAR, the first high-fidelity 128-channel 3D LiDAR dataset featuring panoramic ambient (near infrared) and reflectivity imagery. To improve the segmentation accuracy, we introduce Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture.
arXiv Detail & Related papers (2024-11-01T14:01:54Z)
TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training [21.56675189346088]
We introduce Transformation-Invariant Local (TraIL) features and the associated TraIL-Det architecture. TraIL features exhibit rigid transformation invariance and effectively adapt to variations in point density. They utilize the inherent isotropic radiation of LiDAR to enhance local representation. Our method outperforms contemporary self-supervised 3D object detection approaches in terms of mAP on KITTI.
arXiv Detail & Related papers (2024-08-25T17:59:17Z)
Gaussian Splatting with Localized Points Management [52.009874685460694]
Localized Point Management (LPM) is capable of identifying those error-contributing zones in the highest demand for both point addition and geometry calibration. LPM applies point densification in the identified zone, whilst resetting the opacity of those points residing in front of these regions so that a new opportunity is created to correct ill-conditioned points. Notably, LPM improves both vanilla 3DGS and SpaceTimeGS to achieve state-of-the-art rendering quality while retaining real-time speeds.
arXiv Detail & Related papers (2024-06-06T16:55:07Z)
Reflectivity Is All You Need!: Advancing LiDAR Semantic Segmentation [11.684330305297523]
This paper explores the advantages of employing calibrated intensity (also referred to as reflectivity) within learning-based LiDAR semantic segmentation frameworks. We show that replacing intensity with reflectivity results in a 4% improvement in mean Intersection over Union for off-road scenarios. We demonstrate the potential benefits of using calibrated intensity for semantic segmentation in urban environments.
arXiv Detail & Related papers (2024-03-19T22:57:03Z)
PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z)
Scene-Generalizable Interactive Segmentation of Radiance Fields [64.37093918762]
We make the first attempt at Scene-Generalizable Interactive in Radiance Fields (SGISRF) We propose a novel SGISRF method, which can perform 3D object segmentation for novel (unseen) scenes represented by radiance fields, guided by only a few interactive user clicks in a given set of multi-view 2D images. Experiments on two real-world challenging benchmarks covering diverse scenes demonstrate 1) effectiveness and scene-generalizability of the proposed method, 2) favorable performance compared to classical method requiring scene-specific optimization.
arXiv Detail & Related papers (2023-08-09T17:55:50Z)
CL3D: Unsupervised Domain Adaptation for Cross-LiDAR 3D Detection [16.021932740447966]
Domain adaptation for Cross-LiDAR 3D detection is challenging due to the large gap on the raw data representation. We present an unsupervised domain adaptation method that overcomes above difficulties.
arXiv Detail & Related papers (2022-12-01T03:22:55Z)
Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D. At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules. With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z)
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution. A natural remedy is to utilize the 3D voxelization and 3D convolution network. We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z)
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation [81.02742110604161]
State-of-the-art methods for large-scale driving-scene LiDAR segmentation often project the point clouds to 2D space and then process them via 2D convolution. We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pat-tern. Our method achieves the 1st place in the leaderboard of Semantic KITTI and outperforms existing methods on nuScenes with a noticeable margin, about 4%.
arXiv Detail & Related papers (2020-11-19T18:53:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.