CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis
- URL: http://arxiv.org/abs/2508.01594v1
- Date: Sun, 03 Aug 2025 05:25:12 GMT
- Title: CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis
- Authors: Kai Han, Chongwen Lyu, Lele Ma, Chengxuan Qian, Siqi Ma, Zheng Pang, Jun Chen, Zhe Liu,
- Abstract summary: We propose a Curriculum Learning framework for Imbalanced Multimodal Diagnosis (CLIMD)<n>Specifically, we first design multimodal curriculum measurer that combines intra-modal confidence and inter-modal complementarity, to enable the model to focus on key samples.<n>As a plug-and-play CL framework, CLIMD can be easily integrated into other models, offering a promising path for improving multimodal disease diagnosis accuracy.
- Score: 21.001994821490644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinicians usually combine information from multiple sources to achieve the most accurate diagnosis, and this has sparked increasing interest in leveraging multimodal deep learning for diagnosis. However, in real clinical scenarios, due to differences in incidence rates, multimodal medical data commonly face the issue of class imbalance, which makes it difficult to adequately learn the features of minority classes. Most existing methods tackle this issue with resampling or loss reweighting, but they are prone to overfitting or underfitting and fail to capture cross-modal interactions. Therefore, we propose a Curriculum Learning framework for Imbalanced Multimodal Diagnosis (CLIMD). Specifically, we first design multimodal curriculum measurer that combines two indicators, intra-modal confidence and inter-modal complementarity, to enable the model to focus on key samples and gradually adapt to complex category distributions. Additionally, a class distribution-guided training scheduler is introduced, which enables the model to progressively adapt to the imbalanced class distribution during training. Extensive experiments on multiple multimodal medical datasets demonstrate that the proposed method outperforms state-of-the-art approaches across various metrics and excels in handling imbalanced multimodal medical data. Furthermore, as a plug-and-play CL framework, CLIMD can be easily integrated into other models, offering a promising path for improving multimodal disease diagnosis accuracy. Code is publicly available at https://github.com/KHan-UJS/CLIMD.
Related papers
- Continual Multimodal Contrastive Learning [70.60542106731813]
Multimodal contrastive learning (MCL) advances in aligning different modalities and generating multimodal representations in a joint space.<n>However, a critical yet often overlooked challenge remains: multimodal data is rarely collected in a single process, and training from scratch is computationally expensive.<n>In this paper, we formulate CMCL through two specialized principles of stability and plasticity.<n>We theoretically derive a novel optimization-based method, which projects updated gradients from dual sides onto subspaces where any gradient is prevented from interfering with the previously learned knowledge.
arXiv Detail & Related papers (2025-03-19T07:57:08Z) - Partially Supervised Unpaired Multi-Modal Learning for Label-Efficient Medical Image Segmentation [53.723234136550055]
We term the new learning paradigm as Partially Supervised Unpaired Multi-Modal Learning (PSUMML)<n>We propose a novel Decomposed partial class adaptation with snapshot Ensembled Self-Training (DEST) framework for it.<n>Our framework consists of a compact segmentation network with modality specific normalization layers for learning with partially labeled unpaired multi-modal data.
arXiv Detail & Related papers (2025-03-07T07:22:42Z) - Long-tailed Medical Diagnosis with Relation-aware Representation Learning and Iterative Classifier Calibration [14.556686415877602]
We propose a new Long-tailed Medical Diagnosis (LMD) framework for balanced medical image classification on long-tailed datasets.<n>Our framework significantly surpasses state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-05T14:57:23Z) - MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.<n>MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.<n>We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - A Novel Perspective for Multi-modal Multi-label Skin Lesion Classification [20.05980794886644]
This paper introduces the Skin Lesion, utilizing a Multi-modal Multi-label TransFormer-based model (SkinM2Former)
SkinM2Former achieves a mean average accuracy of 77.27% and a mean diagnostic accuracy of 77.85% on the public Derm7pt dataset, outperforming state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2024-09-19T01:31:38Z) - Multi-modal Learning with Missing Modality in Predicting Axillary Lymph
Node Metastasis [7.207158973042472]
Multi-modal data, whole slide images (WSIs) and clinical information, can improve the performance of deep learning models in the diagnosis of axillary lymph node metastasis.
We propose a bidirectional distillation framework consisting of a multi-modal branch and a single-modal branch.
Our approach achieves state-of-the-art performance with an AUC of 0.861 on the test set without missing data, but also yields an AUC of 0.842 when the rate of missing modality is 80%.
arXiv Detail & Related papers (2024-01-03T05:59:48Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Medical Diagnosis with Large Scale Multimodal Transformers: Leveraging
Diverse Data for More Accurate Diagnosis [0.15776842283814416]
We present a new technical approach of "learnable synergies"
Our approach is easily scalable and naturally adapts to multimodal data inputs from clinical routine.
It outperforms state-of-the-art models in clinically relevant diagnosis tasks.
arXiv Detail & Related papers (2022-12-18T20:43:37Z) - Deep Reinforcement Learning for Multi-class Imbalanced Training [64.9100301614621]
We introduce an imbalanced classification framework, based on reinforcement learning, for training extremely imbalanced data sets.
We formulate a custom reward function and episode-training procedure, specifically with the added capability of handling multi-class imbalanced training.
Using real-world clinical case studies, we demonstrate that our proposed framework outperforms current state-of-the-art imbalanced learning methods.
arXiv Detail & Related papers (2022-05-24T13:39:59Z) - MMLN: Leveraging Domain Knowledge for Multimodal Diagnosis [10.133715767542386]
We propose a knowledge-driven and data-driven framework for lung disease diagnosis.
We formulate diagnosis rules according to authoritative clinical medicine guidelines and learn the weights of rules from text data.
A multimodal fusion consisting of text and image data is designed to infer the marginal probability of lung disease.
arXiv Detail & Related papers (2022-02-09T04:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.