Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic Loss
- URL: http://arxiv.org/abs/2509.23194v1
- Date: Sat, 27 Sep 2025 08:53:27 GMT
- Title: Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic Loss
- Authors: Yifan Zhang, Wei Zhang, Chuangxin He, Zhonghua Miao, Junhui Hou,
- Abstract summary: Unsupervised online 3D instance segmentation is a fundamental yet challenging task.<n>Existing methods, such as UNIT, have made progress in this direction but remain constrained by limited training diversity.<n>We propose a new framework that enriches the training distribution through synthetic point cloud sequence generation.
- Score: 52.28880405119483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised online 3D instance segmentation is a fundamental yet challenging task, as it requires maintaining consistent object identities across LiDAR scans without relying on annotated training data. Existing methods, such as UNIT, have made progress in this direction but remain constrained by limited training diversity, rigid temporal sampling, and heavy dependence on noisy pseudo-labels. We propose a new framework that enriches the training distribution through synthetic point cloud sequence generation, enabling greater diversity without relying on manual labels or simulation engines. To better capture temporal dynamics, our method incorporates a flexible sampling strategy that leverages both adjacent and non-adjacent frames, allowing the model to learn from long-range dependencies as well as short-term variations. In addition, a dynamic-weighting loss emphasizes confident and informative samples, guiding the network toward more robust representations. Through extensive experiments on SemanticKITTI, nuScenes, and PandaSet, our method consistently outperforms UNIT and other unsupervised baselines, achieving higher segmentation accuracy and more robust temporal associations. The code will be publicly available at github.com/Eaphan/SFT3D.
Related papers
- Learning with Preserving for Continual Multitask Learning [4.847042727427382]
We introduce Learning with Preserving (LwP), a novel framework that shifts the focus from preserving task outputs to maintaining the shared representation space.<n>LwP not only mitigates catastrophic forgetting but also consistently outperforms state-of-the-art baselines in CMTL tasks.
arXiv Detail & Related papers (2025-11-11T22:23:20Z) - rETF-semiSL: Semi-Supervised Learning for Neural Collapse in Temporal Data [44.17657834678967]
We propose a novel semi-supervised pre-training strategy to enforce latent representations that satisfy the Neural Collapse phenomenon.<n>We show that our method significantly outperforms previous pretext tasks when applied to LSTMs, transformers, and state-space models.
arXiv Detail & Related papers (2025-08-13T19:16:47Z) - CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation [67.36817440834251]
We propose a unified framework for textbfCLass-incremental textbfImbalance-aware textbf3DIS.<n>Our approach achieves state-of-the-art results, surpassing prior work by up to 16.76% mAP for instance segmentation and approximately 30% mIoU for semantic segmentation.
arXiv Detail & Related papers (2025-02-24T18:58:58Z) - Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment [62.73503467108322]
This topic is widely studied in 3D point cloud segmentation due to the difficulty of annotating point clouds densely.
Until recently, pseudo-labels have been widely employed to facilitate training with limited ground-truth labels.
Existing pseudo-labeling approaches could suffer heavily from the noises and variations in unlabelled data.
We propose a novel learning strategy to regularize the pseudo-labels generated for training, thus effectively narrowing the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2024-08-29T13:31:15Z) - Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent Codes [36.12653178844828]
Trajectory forecasting is crucial for video surveillance analytics, as it enables the anticipation of future movements for a set of agents.
We introduce Vector Quantized Variational Autoencoders (VQ-VAEs), which utilize a discrete latent space to tackle the issue of posterior collapse.
We show that such a two-fold framework, augmented with instance-level discretization, leads to accurate and diverse forecasts.
arXiv Detail & Related papers (2024-05-31T10:13:17Z) - Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training [44.790636524264]
Point Prompt Training is a novel framework for multi-dataset synergistic learning in the context of 3D representation learning.
It can overcome the negative transfer associated with synergistic learning and produce generalizable representations.
It achieves state-of-the-art performance on each dataset using a single weight-shared model with supervised multi-dataset training.
arXiv Detail & Related papers (2023-08-18T17:59:57Z) - CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation [128.00940554196976]
Vision-Language Continual Pretraining (VLCP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets.
To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D.
The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data.
arXiv Detail & Related papers (2023-08-14T13:53:18Z) - Self-Supervised Multi-Object Tracking For Autonomous Driving From
Consistency Across Timescales [53.55369862746357]
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data.
However, their re-identification accuracy still falls short compared to their supervised counterparts.
We propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames.
arXiv Detail & Related papers (2023-04-25T20:47:29Z) - Hierarchical Supervision and Shuffle Data Augmentation for 3D
Semi-Supervised Object Detection [90.32180043449263]
State-of-the-art 3D object detectors are usually trained on large-scale datasets with high-quality 3D annotations.
A natural remedy is to adopt semi-supervised learning (SSL) by leveraging a limited amount of labeled samples and abundant unlabeled samples.
This paper introduces a novel approach of Hierarchical Supervision and Shuffle Data Augmentation (HSSDA), which is a simple yet effective teacher-student framework.
arXiv Detail & Related papers (2023-04-04T02:09:32Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Enabling Continual Learning with Differentiable Hebbian Plasticity [18.12749708143404]
Continual learning is the problem of sequentially learning new tasks or knowledge while protecting previously acquired knowledge.
catastrophic forgetting poses a grand challenge for neural networks performing such learning process.
We propose a Differentiable Hebbian Consolidation model which is composed of a Differentiable Hebbian Plasticity.
arXiv Detail & Related papers (2020-06-30T06:42:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.