Related papers: Data-Efficient Point Cloud Semantic Segmentation Pipeline for Unimproved Roads

Data-Efficient Point Cloud Semantic Segmentation Pipeline for Unimproved Roads

URL: http://arxiv.org/abs/2508.20135v1
Date: Tue, 26 Aug 2025 20:00:36 GMT
Title: Data-Efficient Point Cloud Semantic Segmentation Pipeline for Unimproved Roads
Authors: Andrew Yarovoi, Christopher R. Valenta,
Abstract summary: We present a data-efficient point cloud segmentation pipeline and training framework for robust segmentation of unimproved roads.<n>Our method employs a two-stage training framework: first, a projection-based convolutional neural network is pre-trained on a mixture of public urban datasets and a small, curated in-domain dataset.<n>Using only 50 labeled point clouds from our target domain, we show that our proposed training approach improves mean Intersection-over-Union from 33.5% to 51.8% and the overall accuracy from 85.5% to 90.8%.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this case study, we present a data-efficient point cloud segmentation pipeline and training framework for robust segmentation of unimproved roads and seven other classes. Our method employs a two-stage training framework: first, a projection-based convolutional neural network is pre-trained on a mixture of public urban datasets and a small, curated in-domain dataset; then, a lightweight prediction head is fine-tuned exclusively on in-domain data. Along the way, we explore the application of Point Prompt Training to batch normalization layers and the effects of Manifold Mixup as a regularizer within our pipeline. We also explore the effects of incorporating histogram-normalized ambients to further boost performance. Using only 50 labeled point clouds from our target domain, we show that our proposed training approach improves mean Intersection-over-Union from 33.5% to 51.8% and the overall accuracy from 85.5% to 90.8%, when compared to naive training on the in-domain data. Crucially, our results demonstrate that pre-training across multiple datasets is key to improving generalization and enabling robust segmentation under limited in-domain supervision. Overall, this study demonstrates a practical framework for robust 3D semantic segmentation in challenging, low-data scenarios. Our code is available at: https://github.com/andrewyarovoi/MD-FRNet.

Related papers

Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift [62.50795372173394]
We conduct an exhaustive study to identify recipes for exploiting vision foundation models (VFMs) in unsupervised domain adaptation for semantic segmentation of lidar point clouds.<n>The resulting pipeline achieves state-of-the-art results in four widely-recognized and challenging settings.
arXiv Detail & Related papers (2025-11-21T17:57:43Z)
BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining [2.400704807305413]
Zero-shot 3D object classification is crucial for real-world applications like autonomous driving.<n>It is often hindered by a significant domain gap between the synthetic data used for training and the sparse, noisy LiDAR scans encountered in the real-world.<n>We introduce BlendCLIP, a multimodal pretraining framework that bridges this synthetic-to-real gap by strategically combining the strengths of both domains.
arXiv Detail & Related papers (2025-10-21T03:08:27Z)
E$^3$-Net: Efficient E(3)-Equivariant Normal Estimation Network [47.77270862087191]
We propose E3-Net to achieve equivariance for normal estimation. We introduce an efficient random frame method, which significantly reduces the training resources required for this task to just 1/8 of previous work. Our method achieves superior results on both synthetic and real-world datasets, and outperforms current state-of-the-art techniques by a substantial margin.
arXiv Detail & Related papers (2024-06-01T07:53:36Z)
Point Cloud Pre-training with Diffusion Models [62.12279263217138]
We propose a novel pre-training method called Point cloud Diffusion pre-training (PointDif) PointDif achieves substantial improvement across various real-world datasets for diverse downstream tasks such as classification, segmentation and detection.
arXiv Detail & Related papers (2023-11-25T08:10:05Z)
Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation [0.9208007322096533]
We address the simulation-to-real domain gap in six degree-of-freedom pose estimation (6DoF PE) We propose a novel self-supervised keypoint voting-based 6DoF PE framework, effectively narrowing this gap using a learnable kernel in RKHS. We propose an adapter network, which is pre-trained on purely synthetic data with synthetic ground truth poses, and which evolves the network parameters from this source synthetic domain to the target real domain.
arXiv Detail & Related papers (2023-11-16T01:52:24Z)
AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset [25.935496432142976]
It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset. We formulate the point-cloud pre-training task as a semi-supervised problem, which leverages the few-shot labeled and massive unlabeled point-cloud data. We achieve significant performance gains on a series of downstream perception benchmarks including nuScenes, and KITTI, under different baseline models.
arXiv Detail & Related papers (2023-06-01T12:32:52Z)
Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation [26.94284739177754]
New pipeline requires fewer ground-truth annotations to achieve superior segmentation accuracy. New Sparse Depthwise Separable Convolution module significantly reduces the network parameter count. New Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method extracts a more diverse subset of training data frame samples.
arXiv Detail & Related papers (2023-03-20T15:36:10Z)
Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation. We develop a new adversarial learning based method, which is simple and efficient to apply. We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z)
Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones. We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z)
Learning Semantic Segmentation of Large-Scale Point Clouds with Random Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z)
Inception Convolution with Efficient Dilation Search [121.41030859447487]
Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects. We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers. We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
arXiv Detail & Related papers (2020-12-25T14:58:35Z)
Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision [73.76277367528657]
Convolutional neural network-based approaches have achieved remarkable progress in semantic segmentation. To cope with this limitation, automatically annotated data generated from graphic engines are used to train segmentation models. We propose a two-step self-supervised domain adaptation approach to minimize the inter-domain and intra-domain gap together.
arXiv Detail & Related papers (2020-04-16T15:24:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.