Agglomerating Large Vision Encoders via Distillation for VFSS Segmentation
- URL: http://arxiv.org/abs/2504.02351v1
- Date: Thu, 03 Apr 2025 07:38:09 GMT
- Title: Agglomerating Large Vision Encoders via Distillation for VFSS Segmentation
- Authors: Chengxi Zeng, Yuxuan Jiang, Fan Zhang, Alberto Gambaruto, Tilo Burghardt,
- Abstract summary: We propose a new framework to improve the performance of low complexity models for medical image segmentation tasks.<n>The agglomerated model demonstrates superior generalization across 12 segmentation tasks, whereas specialized models require explicit training for each task.
- Score: 3.8945524993645106
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The deployment of foundation models for medical imaging has demonstrated considerable success. However, their training overheads associated with downstream tasks remain substantial due to the size of the image encoders employed, and the inference complexity is also significantly high. Although lightweight variants have been obtained for these foundation models, their performance is constrained by their limited model capacity and suboptimal training strategies. In order to achieve an improved tradeoff between complexity and performance, we propose a new framework to improve the performance of low complexity models via knowledge distillation from multiple large medical foundation models (e.g., MedSAM, RAD-DINO, MedCLIP), each specializing in different vision tasks, with the goal to effectively bridge the performance gap for medical image segmentation tasks. The agglomerated model demonstrates superior generalization across 12 segmentation tasks, whereas specialized models require explicit training for each task. Our approach achieved an average performance gain of 2\% in Dice coefficient compared to simple distillation.
Related papers
- Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation [13.018234326432964]
We propose a novel and generalizable task-specific knowledge distillation framework for medical image segmentation.<n>Our method fine-tunes the VFM on the target segmentation task to capture task-specific features before distilling the knowledge to smaller models.<n> Experimental results across five medical image datasets demonstrate that our method consistently outperforms task-agnostic knowledge distillation.
arXiv Detail & Related papers (2025-03-10T06:39:53Z) - Active Data Curation Effectively Distills Large-Scale Multimodal Models [66.23057263509027]
Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones.<n>In this work we explore an alternative, yet simple approach -- active data curation as effective distillation for contrastive multimodal pretraining.<n>Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations.
arXiv Detail & Related papers (2024-11-27T18:50:15Z) - KA$^2$ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation [5.807887214293438]
We propose an adaptive amalgamation knowledge framework that aims to train a versatile foundation model to handle the joint goals of multiple expert models.
In particular, we first train an nnUNet-based expert model for each task, and reuse the pre-trained SwinUNTER as the target foundation model.
Within the hidden layer, the hierarchical attention mechanisms are designed to achieve adaptive merging of the target model to the hidden layer feature knowledge of all experts.
arXiv Detail & Related papers (2024-10-28T14:49:17Z) - SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery [54.866490321241905]
Model merging-based multitask learning (MTL) offers a promising approach for performing MTL by merging multiple expert models.
In this paper, we examine the merged model's representation distribution and uncover a critical issue of "representation bias"
This bias arises from a significant distribution gap between the representations of the merged and expert models, leading to the suboptimal performance of the merged MTL model.
arXiv Detail & Related papers (2024-10-18T11:49:40Z) - LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models [59.961172635689664]
"Knowledge Decomposition" aims to improve the performance on specific medical tasks.
We propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD)
LoRKD explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
arXiv Detail & Related papers (2024-09-29T03:56:21Z) - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - Few-Shot Airway-Tree Modeling using Data-Driven Sparse Priors [0.0]
Few-shot learning approaches are cost-effective to transfer pre-trained models using only limited annotated data.
We train a data-driven sparsification module to enhance airways efficiently in lung CT scans.
We then incorporate these sparse representations in a standard supervised segmentation pipeline as a pretraining step to enhance the performance of the DL models.
arXiv Detail & Related papers (2024-07-05T13:46:11Z) - DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder.<n>DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z) - One-stop Training of Multiple Capacity Models [74.87789190840527]
We propose a novel one-stop training framework to jointly train high-capacity and low-capactiy models.
Unlike knowledge distillation, where multiple capacity models are trained from scratch separately, our approach integrates supervisions from different capacity models simultaneously.
arXiv Detail & Related papers (2023-05-23T13:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.