Related papers: MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation

MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation

URL: http://arxiv.org/abs/2503.12927v2
Date: Wed, 19 Mar 2025 09:27:16 GMT
Title: MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation
Authors: Huangwei Chen, Yifei Chen, Zhenyu Yan, Mingyang Ding, Chenlei Li, Zhu Zhu, Feiwei Qin,
Abstract summary: We introduce MMLNB, a multi-modal learning model that integrates pathological images with generated textual descriptions to improve classification accuracy and interpretability.<n>This research creates a scalable AI-driven framework for digital pathology, enhancing reliability and interpretability in Neuroblastoma subtyping classification.
Score: 1.8947479010393964
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neuroblastoma (NB), a leading cause of childhood cancer mortality, exhibits significant histopathological variability, necessitating precise subtyping for accurate prognosis and treatment. Traditional diagnostic methods rely on subjective evaluations that are time-consuming and inconsistent. To address these challenges, we introduce MMLNB, a multi-modal learning (MML) model that integrates pathological images with generated textual descriptions to improve classification accuracy and interpretability. The approach follows a two-stage process. First, we fine-tune a Vision-Language Model (VLM) to enhance pathology-aware text generation. Second, the fine-tuned VLM generates textual descriptions, using a dual-branch architecture to independently extract visual and textual features. These features are fused via Progressive Robust Multi-Modal Fusion (PRMF) Block for stable training. Experimental results show that the MMLNB model is more accurate than the single modal model. Ablation studies demonstrate the importance of multi-modal fusion, fine-tuning, and the PRMF mechanism. This research creates a scalable AI-driven framework for digital pathology, enhancing reliability and interpretability in NB subtyping classification. Our source code is available at https://github.com/HovChen/MMLNB.

Related papers

CLEAR-Mamba:Towards Accurate, Adaptive and Trustworthy Multi-Sequence Ophthalmic Angiography Classification [18.95392587947337]
Medical image classification is a core task in computer-aided diagnosis (CAD)<n>In ophthalmic practice, fluorescein fundus angiography (FFA) and indocyanine green angiography (ICGA) provide hemodynamic and lesion-structural information.<n>We propose CLEAR-Mamba, an enhanced framework built upon MedMamba with optimizations in both architecture and training strategy.
arXiv Detail & Related papers (2026-01-28T13:40:10Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models [6.2676602262188625]
This paper presents an explainable multimodal framework that integrates RGB and Depth (RGB-D) data to recognize Parkinsonian gait patterns.<n>By combining multimodal feature learning with language-based interpretability, this study bridges the gap between visual recognition and clinical understanding.
arXiv Detail & Related papers (2025-12-04T03:43:43Z)
A Multimodal Deep Learning Framework for Early Diagnosis of Liver Cancer via Optimized BiLSTM-AM-VMD Architecture [7.6708641505004005]
This paper proposes a novel multimodal deep learning framework integrating bidirectional LSTM, multi-head attention mechanism, and variational mode decomposition (BiLSTM-AM-VMD) for early liver cancer diagnosis.<n> Experimental results on real-world datasets demonstrate superior performance over traditional machine learning and baseline deep learning models.
arXiv Detail & Related papers (2025-09-01T06:37:20Z)
A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion [5.15423063632115]
Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection.<n>Existing AI approaches fall short by focusing on single view inputs or single-task outputs.<n>We propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views.
arXiv Detail & Related papers (2025-07-22T18:52:18Z)
Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography [39.58317527488534]
This study compares multimodal and CNN-based methods for automated classification using the BI-RADS system.<n>Zero-shot classification achieved modest performance, while the fine-tuned ConvNeXt model outperformed the BioMedCLIP linear probe.<n>These findings suggest that despite the promise of multimodal learning, CNN-based models with end-to-end fine-tuning provide stronger performance for specialized medical imaging.
arXiv Detail & Related papers (2025-06-16T20:14:37Z)
Prompt to Polyp: Medical Text-Conditioned Image Synthesis with Diffusion Models [5.12801085802078]
The generation of realistic medical images from text descriptions has significant potential to address data scarcity challenges in healthcare AI.<n>This paper presents a comprehensive study of text-to-image synthesis in the medical domain, comparing two distinct approaches.<n>We introduce a novel model named MSDM, an optimized architecture that integrates a clinical text encoder, variational autoencoder, and cross-attention mechanisms.
arXiv Detail & Related papers (2025-05-08T18:07:16Z)
Advancing AI Research Assistants with Expert-Involved Learning [84.30323604785646]
Large language models (LLMs) and large multimodal models (LMMs) promise to accelerate biomedical discovery, yet their reliability remains unclear.<n>We introduce ARIEL (AI Research Assistant for Expert-in-the-Loop Learning), an open-source evaluation and optimization framework.<n>We find that state-of-the-art models generate fluent but incomplete summaries, whereas LMMs struggle with detailed visual reasoning.
arXiv Detail & Related papers (2025-05-03T14:21:48Z)
Towards a Multimodal MRI-Based Foundation Model for Multi-Level Feature Exploration in Segmentation, Molecular Subtyping, and Grading of Glioma [0.2796197251957244]
Multi-Task S-UNETR (MTSUNET) model is a novel foundation-based framework built on the BrainSegFounder model. It simultaneously performs glioma segmentation, histological subtyping and neuroimaging subtyping. It shows significant potential for advancing noninvasive, personalized glioma management by improving predictive accuracy and interpretability.
arXiv Detail & Related papers (2025-03-10T01:27:09Z)
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation. Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process. Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z)
Enhanced Masked Image Modeling to Avoid Model Collapse on Multi-modal MRI Datasets [6.3467517115551875]
Masked image modeling (MIM) has shown promise in utilizing unlabeled data.<n>We analyze and address model collapse in two types: complete collapse and dimensional collapse.<n>We construct the enhanced MIM (E-MIM) with HMP and PBT module to avoid model collapse multi-modal MRI.
arXiv Detail & Related papers (2024-07-15T01:11:30Z)
An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer's disease [13.213387075528017]
Alzheimer's disease (AD) is the most prevalent form of dementia worldwide, encompassing a prodromal stage known as Mild Cognitive Impairment (MCI)<n>The objective of the work was to capture structural and functional modulations of brain structure and function relying on multimodal MRI data and Single Nucleotide Polymorphisms.
arXiv Detail & Related papers (2024-06-19T07:31:47Z)
Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks. We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z)
Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. We explicitly account for prior images and reports when available during both training and fine-tuning. Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z)
MMLN: Leveraging Domain Knowledge for Multimodal Diagnosis [10.133715767542386]
We propose a knowledge-driven and data-driven framework for lung disease diagnosis. We formulate diagnosis rules according to authoritative clinical medicine guidelines and learn the weights of rules from text data. A multimodal fusion consisting of text and image data is designed to infer the marginal probability of lung disease.
arXiv Detail & Related papers (2022-02-09T04:12:30Z)
MCUa: Multi-level Context and Uncertainty aware Dynamic Deep Ensemble for Breast Cancer Histology Image Classification [18.833782238355386]
We propose a novel CNN called Multi-level Context and Uncertainty aware (MCUa) dynamic deep learning ensemble model. MCUamodel has achieved a high accuracy of 98.11% on a breast cancer histology image dataset.
arXiv Detail & Related papers (2021-08-24T13:18:57Z)
A multi-stage machine learning model on diagnosis of esophageal manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage. This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z)
G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers. We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)
Learning Interpretable Microscopic Features of Tumor by Multi-task Adversarial CNNs To Improve Generalization [1.7371375427784381]
Existing CNN models act as black boxes, not ensuring to the physicians that important diagnostic features are used by the model. Here we show that our architecture, by learning end-to-end an uncertainty-based weighting combination of multi-task and adversarial losses, is encouraged to focus on pathology features. Our results on breast lymph node tissue show significantly improved generalization in the detection of tumorous tissue, with best average AUC 0.89 (0.01) against the baseline AUC 0.86 (0.005)
arXiv Detail & Related papers (2020-08-04T12:10:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.