AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided   Diagnosis
        - URL: http://arxiv.org/abs/2401.01074v3
 - Date: Fri, 31 Jan 2025 15:04:02 GMT
 - Title: AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided   Diagnosis
 - Authors: Qiuhui Chen, Yi Hong, 
 - Abstract summary: We propose a transformer-based framework, called Alifuse, for aligning and fusing multimodal medical data.<n>We convert medical images and both unstructured and structured clinical records into vision and language tokens.<n>We apply Alifuse to classify Alzheimer's disease, achieving state-of-the-art performance on five public datasets and outperforming eight baselines.
 - Score: 1.64647940449869
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   Medical data collected for diagnostic decisions are typically multimodal, providing comprehensive information on a subject. While computer-aided diagnosis systems can benefit from multimodal inputs, effectively fusing such data remains a challenging task and a key focus in medical research. In this paper, we propose a transformer-based framework, called Alifuse, for aligning and fusing multimodal medical data. Specifically, we convert medical images and both unstructured and structured clinical records into vision and language tokens, employing intramodal and intermodal attention mechanisms to learn unified representations of all imaging and non-imaging data for classification. Additionally, we integrate restoration modeling with contrastive learning frameworks, jointly learning the high-level semantic alignment between images and texts and the low-level understanding of one modality with the help of another. We apply Alifuse to classify Alzheimer's disease, achieving state-of-the-art performance on five public datasets and outperforming eight baselines. 
 
       
      
        Related papers
        - SMFusion: Semantic-Preserving Fusion of Multimodal Medical Images for   Enhanced Clinical Diagnosis [11.356721356096564]
We propose a novel semantic-guided medical image fusion approach that incorporates medical prior knowledge into the fusion process.<n>We generate diagnostic reports from the fused images to assess the preservation of medical information.<n> Experimental results on test datasets demonstrate that the proposed method achieves superior performance in both qualitative and quantitative evaluations.
arXiv  Detail & Related papers  (2025-05-18T06:15:00Z) - iMedImage Technical Report [5.0953390013898705]
Chromosome karyotype analysis is crucial for diagnosing hereditary diseases, yet detecting structural abnormalities remains challenging.
We developed iMedImage, an end-to-end model for general medical image recognition, demonstrating strong performance across multiple imaging tasks.
arXiv  Detail & Related papers  (2025-03-27T03:25:28Z) - A Survey of Medical Vision-and-Language Applications and Their   Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data.
Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied.
We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv  Detail & Related papers  (2024-11-19T03:27:05Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation   of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv  Detail & Related papers  (2024-09-24T05:01:23Z) - MOSMOS: Multi-organ segmentation facilitated by medical report   supervision [10.396987980136602]
We propose a novel pre-training & fine-tuning framework for Multi-Organ Supervision (MOS)
Specifically, we first introduce global contrastive learning to align medical image-report pairs in the pre-training stage.
To remedy the discrepancy, we further leverage multi-label recognition to implicitly learn the semantic correspondence between image pixels and organ tags.
arXiv  Detail & Related papers  (2024-09-04T03:46:17Z) - Automated Ensemble Multimodal Machine Learning for Healthcare [52.500923923797835]
We introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning.
AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies.
arXiv  Detail & Related papers  (2024-07-25T17:46:38Z) - Integrating Medical Imaging and Clinical Reports Using Multimodal Deep   Learning for Advanced Disease Analysis [3.8758525789991896]
An innovative multi-modal deep learning model is proposed to deeply integrate heterogeneous information from medical images and clinical reports.
For medical images, convolutional neural networks were used to extract high-dimensional features and capture key visual information.
For clinical report text, a two-way long and short-term memory network combined with an attention mechanism is used for deep semantic understanding.
arXiv  Detail & Related papers  (2024-05-23T02:22:10Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv  Detail & Related papers  (2024-04-27T05:03:42Z) - HyperFusion: A Hypernetwork Approach to Multimodal Integration of   Tabular and Medical Imaging Data for Predictive Modeling [4.44283662576491]
We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements.
We show that our framework outperforms both single-modality models and state-of-the-art MRI-tabular data fusion methods.
arXiv  Detail & Related papers  (2024-03-20T05:50:04Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation   Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv  Detail & Related papers  (2024-03-19T03:59:14Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
  and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
 Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv  Detail & Related papers  (2024-02-03T05:48:50Z) - Building RadiologyNET: Unsupervised annotation of a large-scale
  multimodal medical database [0.4915744683251151]
The usage of machine learning in medical diagnosis and treatment has witnessed significant growth in recent years.
However, the availability of large annotated image datasets remains a major obstacle since the process of annotation is time-consuming and costly.
This paper explores how to automatically annotate a database of medical radiology images with regard to their semantic similarity.
arXiv  Detail & Related papers  (2023-07-27T13:00:33Z) - Multi-task Paired Masking with Alignment Modeling for Medical
  Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework.
We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv  Detail & Related papers  (2023-05-13T13:53:48Z) - AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
  Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv  Detail & Related papers  (2022-03-18T13:43:53Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv  Detail & Related papers  (2020-10-20T20:05:35Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.