KDMOS:Knowledge Distillation for Motion Segmentation
- URL: http://arxiv.org/abs/2506.14130v1
- Date: Tue, 17 Jun 2025 02:47:49 GMT
- Title: KDMOS:Knowledge Distillation for Motion Segmentation
- Authors: Chunyu Cao, Jintao Cheng, Zeyu Chen, Linfan Zhan, Rui Fan, Zhijian He, Xiaoyu Tang,
- Abstract summary: Motion Object (MOS) is crucial for autonomous driving, as it enhances localization, path planning, map construction, scene flow estimation, and future state prediction.<n>We propose a logits-based knowledge distillation framework for MOS, aiming to improve accuracy while maintaining real-time efficiency.
- Score: 9.033251104271585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motion Object Segmentation (MOS) is crucial for autonomous driving, as it enhances localization, path planning, map construction, scene flow estimation, and future state prediction. While existing methods achieve strong performance, balancing accuracy and real-time inference remains a challenge. To address this, we propose a logits-based knowledge distillation framework for MOS, aiming to improve accuracy while maintaining real-time efficiency. Specifically, we adopt a Bird's Eye View (BEV) projection-based model as the student and a non-projection model as the teacher. To handle the severe imbalance between moving and non-moving classes, we decouple them and apply tailored distillation strategies, allowing the teacher model to better learn key motion-related features. This approach significantly reduces false positives and false negatives. Additionally, we introduce dynamic upsampling, optimize the network architecture, and achieve a 7.69% reduction in parameter count, mitigating overfitting. Our method achieves a notable IoU of 78.8% on the hidden test set of the SemanticKITTI-MOS dataset and delivers competitive results on the Apollo dataset. The KDMOS implementation is available at https://github.com/SCNU-RISLAB/KDMOS.
Related papers
- Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting [1.9461727843485295]
We propose a set of novel response-priming prompting strategies to enhance the performance of student models.<n>Our approach fine-tunes a smaller Llama 3.1 8B Instruct model by distilling knowledge from a quantized Llama 3.1 405B Instruct teacher model.<n>We find that Ground Truth prompting results in a 55% performance increase on GSM8K for a distilled Llama 3.1 8B Instruct.
arXiv Detail & Related papers (2024-12-18T20:41:44Z) - MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning [62.78292142632335]
Class-Incremental Learning (CIL) requires models to continually acquire knowledge of new classes without forgetting old ones.<n>Existing work seeks to utilize lightweight components to adjust the model.<n>We propose MOdel Surgery (MOS) to rescue the model from forgetting previous knowledge.
arXiv Detail & Related papers (2024-12-12T16:57:20Z) - Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - CV-MOS: A Cross-View Model for Motion Segmentation [13.378850442525945]
We introduce CV-MOS, a cross-view model for moving object segmentation.
We decouple spatial-temporal information by capturing the motion from BEV and RV residual maps.
Our method achieved leading IoU(%) scores of 77.5% and 79.2% on the validation and test sets of the SemanticKitti dataset.
arXiv Detail & Related papers (2024-08-25T09:39:26Z) - TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation [6.856317526681759]
Visual place recognition plays a pivotal role in autonomous exploration and navigation of mobile robots.<n>Existing methods overcome this by exploiting powerful yet large networks.<n>We propose a high-performance teacher and lightweight student distillation framework called TSCM.
arXiv Detail & Related papers (2024-04-02T02:29:41Z) - BootsTAP: Bootstrapped Training for Tracking-Any-Point [62.585297341343505]
Tracking-Any-Point (TAP) can be formalized as an algorithm to track any point on solid surfaces in a video.
We show how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes.
We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin.
arXiv Detail & Related papers (2024-02-01T18:38:55Z) - MF-MOS: A Motion-Focused Model for Moving Object Segmentation [10.533968185642415]
Moving object segmentation (MOS) provides a reliable solution for detecting traffic participants.
Previous methods capture motion features from the range images directly.
We propose MF-MOS, a novel motion-focused model with a dual-branch structure for LiDAR moving object segmentation.
arXiv Detail & Related papers (2024-01-30T13:55:56Z) - Learning Objective-Specific Active Learning Strategies with Attentive
Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting.
We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem.
Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z) - Re-Evaluating LiDAR Scene Flow for Autonomous Driving [80.37947791534985]
Popular benchmarks for self-supervised LiDAR scene flow have unrealistic rates of dynamic motion, unrealistic correspondences, and unrealistic sampling patterns.
We evaluate a suite of top methods on a suite of real-world datasets.
We show that despite the emphasis placed on learning, most performance gains are caused by pre- and post-processing steps.
arXiv Detail & Related papers (2023-04-04T22:45:50Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.