Multimodal Point Cloud Semantic Segmentation With Virtual Point Enhancement
- URL: http://arxiv.org/abs/2504.01449v1
- Date: Wed, 02 Apr 2025 08:02:06 GMT
- Title: Multimodal Point Cloud Semantic Segmentation With Virtual Point Enhancement
- Authors: Zaipeng Duan, Xuzhong Hu, Pei An, Jie Ma,
- Abstract summary: LiDAR-based 3D point cloud recognition has been proven beneficial in various applications.<n>The sparsity and varying density pose a significant challenge in capturing intricate details of objects.<n>We propose a multi-modal point cloud semantic segmentation method based on Virtual Point Enhancement.
- Score: 10.188196569056332
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LiDAR-based 3D point cloud recognition has been proven beneficial in various applications. However, the sparsity and varying density pose a significant challenge in capturing intricate details of objects, particularly for medium-range and small targets. Therefore, we propose a multi-modal point cloud semantic segmentation method based on Virtual Point Enhancement (VPE), which integrates virtual points generated from images to address these issues. These virtual points are dense but noisy, and directly incorporating them can increase computational burden and degrade performance. Therefore, we introduce a spatial difference-driven adaptive filtering module that selectively extracts valuable pseudo points from these virtual points based on density and distance, enhancing the density of medium-range targets. Subsequently, we propose a noise-robust sparse feature encoder that incorporates noise-robust feature extraction and fine-grained feature enhancement. Noise-robust feature extraction exploits the 2D image space to reduce the impact of noisy points, while fine-grained feature enhancement boosts sparse geometric features through inner-voxel neighborhood point aggregation and downsampled voxel aggregation. The results on the SemanticKITTI and nuScenes, two large-scale benchmark data sets, have validated effectiveness, significantly improving 2.89\% mIoU with the introduction of 7.7\% virtual points on nuScenes.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - PIVOT-Net: Heterogeneous Point-Voxel-Tree-based Framework for Point
Cloud Compression [8.778300313732027]
We propose a heterogeneous point cloud compression (PCC) framework.
We unify typical point cloud representations -- point-based, voxel-based, and tree-based representations -- and their associated backbones.
We augment the framework with a proposed context-aware upsampling for decoding and an enhanced voxel transformer for feature aggregation.
arXiv Detail & Related papers (2024-02-11T16:57:08Z) - VirtualPainting: Addressing Sparsity with Virtual Points and
Distance-Aware Data Augmentation for 3D Object Detection [3.5259183508202976]
We present an innovative approach that involves the generation of virtual LiDAR points using camera images.
We also enhance these virtual points with semantic labels obtained from image-based segmentation networks.
Our approach offers a versatile solution that can be seamlessly integrated into various 3D frameworks and 2D semantic segmentation methods.
arXiv Detail & Related papers (2023-12-26T18:03:05Z) - MS23D: A 3D Object Detection Method Using Multi-Scale Semantic Feature Points to Construct 3D Feature Layer [4.644319899528183]
LiDAR point clouds can effectively depict the motion and posture of objects in three-dimensional space.
In autonomous driving scenarios, the sparsity and hollowness of point clouds create some difficulties for voxel-based methods.
We propose a two-stage 3D object detection framework, called MS23D.
arXiv Detail & Related papers (2023-08-31T08:03:25Z) - Focus for Free in Density-Based Counting [56.961229110268036]
We introduce two methods that repurpose the available point annotations to enhance counting performance.
The first is a counting-specific augmentation that leverages point annotations to simulate occluded objects in both input and density images.
The second method, foreground distillation, generates foreground masks from the point annotations, from which we train an auxiliary network on images with blacked-out backgrounds.
arXiv Detail & Related papers (2023-06-08T11:54:37Z) - TransUPR: A Transformer-based Uncertain Point Refiner for LiDAR Point
Cloud Semantic Segmentation [6.587305905804226]
We propose a transformer-based uncertain point refiner, i.e., TransUPR, to refine selected uncertain points in a learnable manner.
Our TransUPR achieves state-of-the-art performance, i.e., 68.2% mean Intersection over Union (mIoU) on the Semantic KITTI benchmark.
arXiv Detail & Related papers (2023-02-16T21:38:36Z) - PV-RCNN++: Semantical Point-Voxel Feature Interaction for 3D Object
Detection [22.6659359032306]
This paper proposes a novel object detection network by semantical point-voxel feature interaction, dubbed PV-RCNN++.
Experiments on the KITTI dataset show that PV-RCNN++ achieves 81.60$%$, 40.18$%$, 68.21$%$ 3D mAP on Car, Pedestrian, and Cyclist, achieving comparable or even better performance to the state-of-the-arts.
arXiv Detail & Related papers (2022-08-29T08:14:00Z) - PUFA-GAN: A Frequency-Aware Generative Adversarial Network for 3D Point
Cloud Upsampling [56.463507980857216]
We propose a generative adversarial network for point cloud upsampling.
It can make the upsampled points evenly distributed on the underlying surface but also efficiently generate clean high frequency regions.
arXiv Detail & Related papers (2022-03-02T07:47:46Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - SPU-Net: Self-Supervised Point Cloud Upsampling by Coarse-to-Fine
Reconstruction with Self-Projection Optimization [52.20602782690776]
It is expensive and tedious to obtain large scale paired sparse-canned point sets for training from real scanned sparse data.
We propose a self-supervised point cloud upsampling network, named SPU-Net, to capture the inherent upsampling patterns of points lying on the underlying object surface.
We conduct various experiments on both synthetic and real-scanned datasets, and the results demonstrate that we achieve comparable performance to the state-of-the-art supervised methods.
arXiv Detail & Related papers (2020-12-08T14:14:09Z) - PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation [111.7241018610573]
We present PointGroup, a new end-to-end bottom-up architecture for instance segmentation.
We design a two-branch network to extract point features and predict semantic labels and offsets, for shifting each point towards its respective instance centroid.
A clustering component is followed to utilize both the original and offset-shifted point coordinate sets, taking advantage of their complementary strength.
We conduct extensive experiments on two challenging datasets, ScanNet v2 and S3DIS, on which our method achieves the highest performance, 63.6% and 64.0%, compared to 54.9% and 54.4% achieved by former best
arXiv Detail & Related papers (2020-04-03T16:26:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.