FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework
- URL: http://arxiv.org/abs/2503.05626v1
- Date: Fri, 07 Mar 2025 17:52:12 GMT
- Title: FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework
- Authors: Jingyu Xu, Yang Wang,
- Abstract summary: A Flexible Multimodal Transformer (FMT) was proposed, which uses ResNet-50 and BERT for joint representation learning.<n>After evaluation on a small multimodal pneumonia dataset, FMT achieved state-of-the-art performance with 94% accuracy, 95% recall, and 93% F1 score.
- Score: 4.429093762434193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial intelligence has shown the potential to improve diagnostic accuracy through medical image analysis for pneumonia diagnosis. However, traditional multimodal approaches often fail to address real-world challenges such as incomplete data and modality loss. In this study, a Flexible Multimodal Transformer (FMT) was proposed, which uses ResNet-50 and BERT for joint representation learning, followed by a dynamic masked attention strategy that simulates clinical modality loss to improve robustness; finally, a sequential mixture of experts (MOE) architecture was used to achieve multi-level decision refinement. After evaluation on a small multimodal pneumonia dataset, FMT achieved state-of-the-art performance with 94% accuracy, 95% recall, and 93% F1 score, outperforming single-modal baselines (ResNet: 89%; BERT: 79%) and the medical benchmark CheXMed (90%), providing a scalable solution for multimodal diagnosis of pneumonia in resource-constrained medical settings.
Related papers
- Structure-Accurate Medical Image Translation based on Dynamic Frequency Balance and Knowledge Guidance [60.33892654669606]
Diffusion model is a powerful strategy to synthesize the required medical images.
Existing approaches still suffer from the problem of anatomical structure distortion due to the overfitting of high-frequency information.
We propose a novel method based on dynamic frequency balance and knowledge guidance.
arXiv Detail & Related papers (2025-04-13T05:48:13Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.
We propose a novel approach utilizing structured medical reasoning.
Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET [9.229658208994675]
We propose a novel framework, DiaMond, to integrate MRI and PET.
DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET.
It significantly outperforms existing multi-modal methods across various datasets.
arXiv Detail & Related papers (2024-10-30T17:11:00Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - PE-MVCNet: Multi-view and Cross-modal Fusion Network for Pulmonary Embolism Prediction [4.659998272408215]
Early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates.
We suggest a multimodal fusion methodology, termed PE-MVCNet, which capitalizes on Computed Tomography Pulmonary Angiography imaging and EMR data.
Our proposed model outperforms existing methodologies, corroborating that our multimodal fusion model excels compared to models that use a single data modality.
arXiv Detail & Related papers (2024-02-27T03:53:27Z) - Multi-modal Learning with Missing Modality in Predicting Axillary Lymph
Node Metastasis [7.207158973042472]
Multi-modal data, whole slide images (WSIs) and clinical information, can improve the performance of deep learning models in the diagnosis of axillary lymph node metastasis.
We propose a bidirectional distillation framework consisting of a multi-modal branch and a single-modal branch.
Our approach achieves state-of-the-art performance with an AUC of 0.861 on the test set without missing data, but also yields an AUC of 0.842 when the rate of missing modality is 80%.
arXiv Detail & Related papers (2024-01-03T05:59:48Z) - Interpretable 3D Multi-Modal Residual Convolutional Neural Network for
Mild Traumatic Brain Injury Diagnosis [1.0621519762024807]
We introduce an interpretable 3D Multi-Modal Residual Convolutional Neural Network (MRCNN) for mTBI diagnostic model enhanced with Occlusion Sensitivity Maps (OSM)
Our MRCNN model exhibits promising performance in mTBI diagnosis, demonstrating an average accuracy of 82.4%, sensitivity of 82.6%, and specificity of 81.6%, as validated by a five-fold cross-validation process.
arXiv Detail & Related papers (2023-09-22T01:58:27Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - HGT: A Hierarchical GCN-Based Transformer for Multimodal Periprosthetic
Joint Infection Diagnosis Using CT Images and Text [0.0]
Prosthetic Joint Infection (PJI) is a prevalent and severe complication.
Currently, a unified diagnostic standard incorporating both computed tomography (CT) images and numerical text data for PJI remains unestablished.
This study introduces a diagnostic method, HGT, based on deep learning and multimodal techniques.
arXiv Detail & Related papers (2023-05-29T11:25:57Z) - MMLN: Leveraging Domain Knowledge for Multimodal Diagnosis [10.133715767542386]
We propose a knowledge-driven and data-driven framework for lung disease diagnosis.
We formulate diagnosis rules according to authoritative clinical medicine guidelines and learn the weights of rules from text data.
A multimodal fusion consisting of text and image data is designed to infer the marginal probability of lung disease.
arXiv Detail & Related papers (2022-02-09T04:12:30Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z) - Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement
and Gated Fusion [71.87627318863612]
We propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities.
Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code.
We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset.
arXiv Detail & Related papers (2020-02-22T14:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.