Related papers: JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation

JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation

URL: http://arxiv.org/abs/2505.10057v1
Date: Thu, 15 May 2025 08:00:48 GMT
Title: JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation
Authors: Tiancong Cheng, Ying Zhang, Yuxuan Liang, Roger Zimmermann, Zhiwen Yu, Bin Guo,
Abstract summary: This work explores how the multi-task distillation could be used to improve unified modeling.<n>We propose a self-adaptive distillation method that can adjust the knowledge amount from each teacher according to the student's current learning ability.<n>We evaluate our method on multiple benchmarking datasets including Cityscapes and NYU-v2.
Score: 31.89422375115854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Depth estimation and scene segmentation are two important tasks in intelligent transportation systems. A joint modeling of these two tasks will reduce the requirement for both the storage and training efforts. This work explores how the multi-task distillation could be used to improve such unified modeling. While existing solutions transfer multiple teachers' knowledge in a static way, we propose a self-adaptive distillation method that can dynamically adjust the knowledge amount from each teacher according to the student's current learning ability. Furthermore, as multiple teachers exist, the student's gradient update direction in the distillation is more prone to be erroneous where knowledge forgetting may occur. To avoid this, we propose a knowledge trajectory to record the most essential information that a model has learnt in the past, based on which a trajectory-based distillation loss is designed to guide the student to follow the learning curve similarly in a cost-effective way. We evaluate our method on multiple benchmarking datasets including Cityscapes and NYU-v2. Compared to the state-of-the-art solutions, our method achieves a clearly improvement. The code is provided in the supplementary materials.

Related papers

Cross-Modal Distillation For Widely Differing Modalities [31.049823782188437]
We conduct multi-modal learning by introducing a teacher model to transfer discriminative knowledge to a student model during training.<n>This knowledge transfer via distillation is not trivial because the big domain gap between the widely differing modalities can easily lead to overfitting.<n>We propose two soft constrained knowledge distillation strategies at the feature level and a quality-based adaptive weights module to weigh input samples.
arXiv Detail & Related papers (2025-07-22T07:34:00Z)
MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation [8.68486556125022]
MST-Distill is a novel cross-modal knowledge distillation framework featuring a mixture of specialized teachers.<n>This paper empirically reveals two critical issues in existing approaches: distillation path selection and knowledge drift.<n>Our approach employs a diverse ensemble of teacher models across both cross-modal and multimodal configurations, integrated with an instance-level routing network.
arXiv Detail & Related papers (2025-07-09T16:45:28Z)
Smooth-Distill: A Self-distillation Framework for Multitask Learning with Wearable Sensor Data [0.0]
This paper introduces Smooth-Distill, a novel self-distillation framework designed to simultaneously perform human activity recognition (HAR) and sensor placement detection.<n>Unlike conventional distillation methods that require separate teacher and student models, the proposed framework utilizes a smoothed, historical version of the model itself as the teacher.<n> Experimental results show that Smooth-Distill consistently outperforms alternative approaches across different evaluation scenarios.
arXiv Detail & Related papers (2025-06-27T06:51:51Z)
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only.<n> Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods. MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections. Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z)
Learning to Maximize Mutual Information for Chain-of-Thought Distillation [13.660167848386806]
Distilling Step-by-Step(DSS) has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts. However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction. We propose a variational approach to solve this problem using a learning-based method.
arXiv Detail & Related papers (2024-03-05T22:21:45Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation [130.08432609780374]
In 3D action recognition, there exists rich complementary information between skeleton modalities. We propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. Our approach outperforms existing self-supervised methods and sets a series of new records.
arXiv Detail & Related papers (2022-08-26T06:06:09Z)
Cross-Task Knowledge Distillation in Multi-Task Recommendation [41.62428191434233]
Multi-task learning has been widely used in real-world recommenders to predict different types of user feedback. We propose a Cross-Task Knowledge Distillation framework in recommendation, which consists of three procedures.
arXiv Detail & Related papers (2022-02-20T16:15:19Z)
Information Theoretic Representation Distillation [20.802135299032308]
We forge an alternative connection between information theory and knowledge distillation using a recently proposed entropy-like functional. Our method achieves competitive performance to state-of-the-art on the knowledge distillation and cross-model transfer tasks. We shed light to a new state-of-the-art for binary quantisation.
arXiv Detail & Related papers (2021-12-01T12:39:50Z)
Multi-Scale Aligned Distillation for Low-Resolution Detection [68.96325141432078]
This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model. On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training.
arXiv Detail & Related papers (2021-09-14T12:53:35Z)
Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model. The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
arXiv Detail & Related papers (2020-05-02T06:56:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.