Related papers: MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI

MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI

URL: http://arxiv.org/abs/2511.11212v1
Date: Fri, 14 Nov 2025 12:10:59 GMT
Title: MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI
Authors: Mohammad Areeb Qazi, Munachiso S Nwadike, Ibrahim Almakky, Mohammad Yaqub, Numan Saeed,
Abstract summary: We propose MAFM3, a framework that enables a single foundation model to expand into diverse domains, tasks, and modalities.<n>Unlike conventional adaptation methods that treat each new task or modality in isolation, MAFM3 provides a unified and expandable framework for efficient multitask and multimodality adaptation.
Score: 3.1920084309415007
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Foundational models are trained on extensive datasets to capture the general trends of a domain. However, in medical imaging, the scarcity of data makes pre-training for every domain, modality, or task challenging. Instead of building separate models, we propose MAFM^3 (Modular Adaptation of Foundation Models for Multi-Modal Medical AI), a framework that enables a single foundation model to expand into diverse domains, tasks, and modalities through lightweight modular components. These components serve as specialized skill sets that allow the system to flexibly activate the appropriate capability at the inference time, depending on the input type or clinical objective. Unlike conventional adaptation methods that treat each new task or modality in isolation, MAFM^3 provides a unified and expandable framework for efficient multitask and multimodality adaptation. Empirically, we validate our approach by adapting a chest CT foundation model initially trained for classification into prognosis and segmentation modules. Our results show improved performance on both tasks. Furthermore, by incorporating PET scans, MAFM^3 achieved an improvement in the Dice score 5% compared to the respective baselines. These findings establish that foundation models, when equipped with modular components, are not inherently constrained to their initial training scope but can evolve into multitask, multimodality systems for medical imaging. The code implementation of this work can be found at https://github.com/Areeb2735/CTscan_prognosis_VLM

Related papers

TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics [56.073642366268764]
TokaMind is an open-source foundation model framework for fusion plasma modeling.<n>It is trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset.<n>We evaluate TokaMind on the recently introduced MAST benchmark TokaMark.
arXiv Detail & Related papers (2026-02-16T12:26:07Z)
MedDINOv3: How to adapt vision foundation models for medical image segmentation? [16.256590269050367]
We introduce MedDINOv3, a simple and effective framework for adapting DINOv3 to medical segmentation.<n>We perform domain-adaptive pretraining on CT-3M, a curated collection of 3.87M axial CT slices, using a multi-stage DINOv3 recipe.<n>MedDINOv3 matches or exceeds state-of-the-art performance across four segmentation benchmarks.
arXiv Detail & Related papers (2025-09-02T14:44:43Z)
UNICON: UNIfied CONtinual Learning for Medical Foundational Models [0.8672882547905405]
In medical imaging, the scarcity of data makes pre-training for every domain, modality, or task challenging.<n>Continual learning offers a solution by fine-tuning a model sequentially on different domains or tasks.<n>We propose UNIfied CONtinual Learning for Medical Foundational Models (UNICON), a framework that enables seamless adaptation of foundation models.
arXiv Detail & Related papers (2025-08-19T17:31:32Z)
Med-LEGO: Editing and Adapting toward Generalist Medical Image Diagnosis [17.10843389390131]
Med-LEGO is a training-free framework that enables the seamless integration or updating of a generalist CAD model.<n>Our experiments demonstrate that Med-LEGO outperforms existing methods in both cross-domain and in-domain medical tasks.
arXiv Detail & Related papers (2025-03-03T04:27:11Z)
MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.<n>MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.<n>We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z)
KA$^2$ER: Knowledge Adaptive Amalgamation of ExpeRts for Medical Images Segmentation [5.807887214293438]
We propose an adaptive amalgamation knowledge framework that aims to train a versatile foundation model to handle the joint goals of multiple expert models. In particular, we first train an nnUNet-based expert model for each task, and reuse the pre-trained SwinUNTER as the target foundation model. Within the hidden layer, the hierarchical attention mechanisms are designed to achieve adaptive merging of the target model to the hidden layer feature knowledge of all experts.
arXiv Detail & Related papers (2024-10-28T14:49:17Z)
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation. Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process. Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z)
FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models [54.09244105445476]
This study introduces a novel knowledge injection approach, FedKIM, to scale the medical foundation model within a federated learning framework.<n>FedKIM leverages lightweight local models to extract healthcare knowledge from private data and integrates this knowledge into a centralized foundation model.<n>Our experiments across twelve tasks in seven modalities demonstrate the effectiveness of FedKIM in various settings.
arXiv Detail & Related papers (2024-08-17T15:42:29Z)
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models [17.643421997037514]
We propose a novel framework that tackles both discriminative and generative multimodal medical tasks. The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning. Our model can achieve performance superior to or on par with state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-16T02:35:17Z)
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z)
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning. We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z)
Specialty-Oriented Generalist Medical AI for Chest CT Screening [14.31187762890342]
We propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with application in lung cancer screening and related tasks. M3FM consistently outperforms the state-of-the-art single-modal task-specific models. As a specialty-oriented generalist medical AI model, M3FM paves the way for similar breakthroughs in other areas of medicine.
arXiv Detail & Related papers (2023-04-03T20:19:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.