Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation
- URL: http://arxiv.org/abs/2503.07167v1
- Date: Mon, 10 Mar 2025 10:44:11 GMT
- Title: Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation
- Authors: Ziliang Miao, Runjian Chen, Yixi Cai, Buwei He, Wenquan Zhao, Wenqi Shao, Bo Zhang, Fu Zhang,
- Abstract summary: Moving object segmentation (MOS) on LiDAR point clouds is crucial for autonomous systems like self-driving vehicles.<n>We propose textbfTemporal textbfOverlapping textbfPtemporalrediction (textbfTOP), a self-supervised pre-training method.
- Score: 17.089686769826233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Moving object segmentation (MOS) on LiDAR point clouds is crucial for autonomous systems like self-driving vehicles. Previous supervised approaches rely heavily on costly manual annotations, while LiDAR sequences naturally capture temporal motion cues that can be leveraged for self-supervised learning. In this paper, we propose \textbf{T}emporal \textbf{O}verlapping \textbf{P}rediction (\textbf{TOP}), a self-supervised pre-training method that alleviate the labeling burden for MOS. \textbf{TOP} explores the temporal overlapping points that commonly observed by current and adjacent scans, and learns spatiotemporal representations by predicting the occupancy states of temporal overlapping points. Moreover, we utilize current occupancy reconstruction as an auxiliary pre-training objective, which enhances the current structural awareness of the model. We conduct extensive experiments and observe that the conventional metric Intersection-over-Union (IoU) shows strong bias to objects with more scanned points, which might neglect small or distant objects. To compensate for this bias, we introduce an additional metric called $\text{mIoU}_{\text{obj}}$ to evaluate object-level performance. Experiments on nuScenes and SemanticKITTI show that \textbf{TOP} outperforms both supervised training-from-scratch baseline and other self-supervised pre-training baselines by up to 28.77\% relative improvement, demonstrating strong transferability across LiDAR setups and generalization to other tasks. Code and pre-trained models will be publicly available upon publication.
Related papers
- SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.
We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.
With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - UnO: Unsupervised Occupancy Fields for Perception and Forecasting [33.205064287409094]
Supervised approaches leverage annotated object labels to learn a model of the world.
We learn to perceive and forecast a continuous 4D occupancy field with self-supervision from LiDAR data.
This unsupervised world model can be easily and effectively transferred to tasks.
arXiv Detail & Related papers (2024-06-12T23:22:23Z) - Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations [53.797896854533384]
Class-agnostic motion prediction methods directly predict the motion of the entire point cloud.
While most existing methods rely on fully-supervised learning, the manual labeling of point cloud data is laborious and time-consuming.
We introduce three simple spatial and temporal regularization losses, which facilitate the self-supervised training process effectively.
arXiv Detail & Related papers (2024-03-20T02:58:45Z) - Self-Supervised Multi-Object Tracking For Autonomous Driving From
Consistency Across Timescales [53.55369862746357]
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data.
However, their re-identification accuracy still falls short compared to their supervised counterparts.
We propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames.
arXiv Detail & Related papers (2023-04-25T20:47:29Z) - ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale
LiDAR Point Clouds [21.6511040107249]
We propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation.
We predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes.
arXiv Detail & Related papers (2023-04-25T05:46:24Z) - Self-Supervised Representation Learning from Temporal Ordering of
Automated Driving Sequences [49.91741677556553]
We propose TempO, a temporal ordering pretext task for pre-training region-level feature representations for perception tasks.
We embed each frame by an unordered set of proposal feature vectors, a representation that is natural for object detection or tracking systems.
Extensive evaluations on the BDD100K, nuImages, and MOT17 datasets show that our TempO pre-training approach outperforms single-frame self-supervised learning methods.
arXiv Detail & Related papers (2023-02-17T18:18:27Z) - Bootstrap Motion Forecasting With Self-Consistent Constraints [52.88100002373369]
We present a novel framework to bootstrap Motion forecasting with Self-consistent Constraints.
The motion forecasting task aims at predicting future trajectories of vehicles by incorporating spatial and temporal information from the past.
We show that our proposed scheme consistently improves the prediction performance of several existing methods.
arXiv Detail & Related papers (2022-04-12T14:59:48Z) - Self-Supervised Learning for Semi-Supervised Temporal Action Proposal [42.6254639252739]
We design an effective Self-supervised Semi-supervised Temporal Action Proposal (SSTAP) framework.
The SSTAP contains two crucial branches, i.e., temporal-aware semi-supervised branch and relation-aware self-supervised branch.
We extensively evaluate the proposed SSTAP on THUMOS14 and ActivityNet v1.3 datasets.
arXiv Detail & Related papers (2021-04-07T16:03:25Z) - Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation.
Our framework can be trained without the help of any manual annotation or pretrained network.
Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.