DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation
- URL: http://arxiv.org/abs/2512.11465v1
- Date: Fri, 12 Dec 2025 11:07:40 GMT
- Title: DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation
- Authors: Mohamed Abdelsamad, Michael Ulrich, Bin Yang, Miao Zhang, Yakov Miron, Abhinav Valada,
- Abstract summary: DOS (Distilling Observable Softmaps) is a novel SSL framework that self-distills semantic relevance softmaps only at observable points.<n> DOS outperforms current state-of-the-art methods on semantic segmentation and 3D object detection.<n>Our results demonstrate that observable-point softmaps distillation offers a scalable and effective paradigm for learning robust 3D representations.
- Score: 25.293422897925698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in self-supervised learning (SSL) have shown tremendous potential for learning 3D point cloud representations without human annotations. However, SSL for 3D point clouds still faces critical challenges due to irregular geometry, shortcut-prone reconstruction, and unbalanced semantics distribution. In this work, we propose DOS (Distilling Observable Softmaps), a novel SSL framework that self-distills semantic relevance softmaps only at observable (unmasked) points. This strategy prevents information leakage from masked regions and provides richer supervision than discrete token-to-prototype assignments. To address the challenge of unbalanced semantics in an unsupervised setting, we introduce Zipfian prototypes and incorporate them using a modified Sinkhorn-Knopp algorithm, Zipf-Sinkhorn, which enforces a power-law prior over prototype usage and modulates the sharpness of the target softmap during training. DOS outperforms current state-of-the-art methods on semantic segmentation and 3D object detection across multiple benchmarks, including nuScenes, Waymo, SemanticKITTI, ScanNet, and ScanNet200, without relying on extra data or annotations. Our results demonstrate that observable-point softmaps distillation offers a scalable and effective paradigm for learning robust 3D representations.
Related papers
- Unified Unsupervised and Sparsely-Supervised 3D Object Detection by Semantic Pseudo-Labeling and Prototype Learning [0.0]
3D object detection is essential for autonomous driving and robotic perception.<n>To reduce annotation dependency, unsupervised and sparsely-supervised paradigms have emerged.<n>This paper proposes SPL, a unified training framework for both Unsupervised and Sparsely-Supervised 3D Object Detection.
arXiv Detail & Related papers (2026-02-25T01:26:34Z) - Masked Clustering Prediction for Unsupervised Point Cloud Pre-training [61.11226004056774]
MaskClu is a novel unsupervised pre-training method for ViTs on 3D point clouds.<n>It integrates masked point modeling with clustering-based learning.
arXiv Detail & Related papers (2025-08-12T12:58:44Z) - 3D-PointZshotS: Geometry-Aware 3D Point Cloud Zero-Shot Semantic Segmentation Narrowing the Visual-Semantic Gap [10.744510913722817]
3D-PointZshotS is a geometry-aware zero-shot segmentation framework.<n>We integrate LGPs into a generator via a cross-attention mechanism, enriching semantic features with fine-grained geometric details.<n>We re-present visual and semantic features in a shared space, bridging the semantic-visual gap and facilitating knowledge transfer to unseen classes.
arXiv Detail & Related papers (2025-04-16T19:17:12Z) - Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds [9.994719163112416]
Masked autoencoders (MAE) have shown tremendous potential for self-supervised learning (SSL) in vision and beyond.<n>Point clouds from LiDARs used in automated driving are particularly challenging for MAEs since large areas of the 3D volume are empty.<n>We propose the novel neighborhood occupancy MAE (NOMAE) that overcomes the aforementioned challenges by employing masked occupancy reconstruction only in the neighborhood of non-masked voxels.
arXiv Detail & Related papers (2025-02-27T17:42:47Z) - LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes [2.822816116516042]
Large-scale semantic mapping is crucial for outdoor autonomous agents to fulfill high-level tasks such as planning and navigation.
This paper proposes a novel method for large-scale 3D semantic reconstruction through implicit representations from posed LiDAR measurements alone.
arXiv Detail & Related papers (2023-11-04T03:55:38Z) - Neural Semantic Surface Maps [52.61017226479506]
We present an automated technique for computing a map between two genus-zero shapes, which matches semantically corresponding regions to one another.
Our approach can generate semantic surface-to-surface maps, eliminating manual annotations or any 3D training data requirement.
arXiv Detail & Related papers (2023-09-09T16:21:56Z) - CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud
Semantic Segmentation [60.0893353960514]
We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations.
We propose a Contextual Point Cloud Modeling ( CPCM) method that consists of two parts: a region-wise masking (RegionMask) strategy and a contextual masked training (CMT) method.
arXiv Detail & Related papers (2023-07-19T04:41:18Z) - PointDC:Unsupervised Semantic Segmentation of 3D Point Clouds via
Cross-modal Distillation and Super-Voxel Clustering [32.18716273358168]
We take the first attempt for fully unsupervised semantic segmentation of point clouds.
We propose a novel framework, PointDC, comprised of two steps that handle the aforementioned problems.
PointDC yields a significant improvement over the prior state-of-the-art unsupervised methods.
arXiv Detail & Related papers (2023-04-18T12:58:21Z) - MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds [13.426810473131642]
Masked AutoEncoder for LiDAR point clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction.
In a novel reconstruction approach, MAELi distinguishes between empty and occluded space.
Thereby, without any ground truth whatsoever and trained on single frames only, MAELi obtains an understanding of the underlying 3D scene geometry and semantics.
arXiv Detail & Related papers (2022-12-14T13:10:27Z) - MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition.
In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE.
MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z) - Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers.
We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z) - Dense Supervision Propagation for Weakly Supervised Semantic Segmentation on 3D Point Clouds [59.63231842439687]
We train a semantic point cloud segmentation network with only a small portion of points being labeled.
We propose a cross-sample feature reallocating module to transfer similar features and therefore re-route the gradients across two samples.
Our weakly supervised method with only 10% and 1% of labels can produce compatible results with the fully supervised counterpart.
arXiv Detail & Related papers (2021-07-23T14:34:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.