Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning
- URL: http://arxiv.org/abs/2309.05904v3
- Date: Tue, 3 Sep 2024 01:40:52 GMT
- Title: Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning
- Authors: Weijian Huang, Cheng Li, Hong-Yu Zhou, Hao Yang, Jiarun Liu, Yong Liang, Hairong Zheng, Shaoting Zhang, Shanshan Wang,
- Abstract summary: MaCo is a masked contrastive chest X-ray foundation model.
It simultaneously achieves fine-grained image understanding and zero-shot learning for a variety of medical imaging tasks.
It is shown to be superior over 10 state-of-the-art approaches across tasks such as classification, segmentation, detection, and phrase grounding.
- Score: 26.425784890859738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face crucial challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and the capability of utilizing very limited or even no task-specific labeled data in real-world clinical applications. In this study, we present MaCo, a masked contrastive chest X-ray foundation model that tackles these challenges. MaCo explores masked contrastive learning to simultaneously achieve fine-grained image understanding and zero-shot learning for a variety of medical imaging tasks. It designs a correlation weighting mechanism to adjust the correlation between masked chest X-ray image patches and their corresponding reports, thereby enhancing the model's representation learning capabilities. To evaluate the performance of MaCo, we conducted extensive experiments using 6 well-known open-source X-ray datasets. The experimental results demonstrate the superiority of MaCo over 10 state-of-the-art approaches across tasks such as classification, segmentation, detection, and phrase grounding. These findings highlight the significant potential of MaCo in advancing a wide range of medical image analysis tasks.
Related papers
- Towards a vision foundation model for comprehensive assessment of Cardiac MRI [11.838157772803282]
We introduce a vision foundation model trained for cardiac magnetic resonance imaging (CMR) assessment.
We finetune the model in supervised way for 9 clinical tasks typical to a CMR workflow.
We demonstrate improved accuracy and robustness across all tasks, over a range of available labeled dataset sizes.
arXiv Detail & Related papers (2024-10-02T15:32:01Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images [3.5418498524791766]
This research is development of a novel counterfactual inpainting approach (COIN)
COIN flips the predicted classification label from abnormal to normal by using a generative model.
The effectiveness of the method is demonstrated by segmenting synthetic targets and actual kidney tumors from CT images acquired from Tartu University Hospital in Estonia.
arXiv Detail & Related papers (2024-04-19T12:09:49Z) - Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability [2.64700310378485]
We introduce a novel deep-learning framework for joint disease diagnosis and prediction of corresponding visual saliency maps for chest X-ray scans.
Specifically, we designed a novel dual-encoder multi-task UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder.
Experiments show that our proposed method outperformed existing techniques for chest X-ray diagnosis and the quality of visual saliency map prediction.
arXiv Detail & Related papers (2024-03-25T17:31:12Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - MLVICX: Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning [6.4136876268620115]
MLVICX is an approach to capture rich representations in the form of embeddings from chest X-ray images.
We demonstrate the performance of MLVICX in advancing self-supervised chest X-ray representation learning.
arXiv Detail & Related papers (2024-03-18T06:19:37Z) - MUSCLE: Multi-task Self-supervised Continual Learning to Pre-train Deep
Models for X-ray Images of Multiple Body Parts [63.30352394004674]
Multi-task Self-super-vised Continual Learning (MUSCLE) is a novel self-supervised pre-training pipeline for medical imaging tasks.
MUSCLE aggregates X-rays collected from multiple body parts for representation learning, and adopts a well-designed continual learning procedure.
We evaluate MUSCLE using 9 real-world X-ray datasets with various tasks, including pneumonia classification, skeletal abnormality classification, lung segmentation, and tuberculosis (TB) detection.
arXiv Detail & Related papers (2023-10-03T12:19:19Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Explainable Deep Learning Methods in Medical Image Classification: A
Survey [0.0]
State-of-the-art deep learning models have achieved human-level accuracy on the classification of different types of medical data.
These models are hardly adopted in clinical, mainly due to their lack of interpretability.
The black-box-ness of deep learning models has raised the need for devising strategies to explain the decision process of these models.
arXiv Detail & Related papers (2022-05-10T09:28:14Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Weakly supervised multiple instance learning histopathological tumor
segmentation [51.085268272912415]
We propose a weakly supervised framework for whole slide imaging segmentation.
We exploit a multiple instance learning scheme for training models.
The proposed framework has been evaluated on multi-locations and multi-centric public data from The Cancer Genome Atlas and the PatchCamelyon dataset.
arXiv Detail & Related papers (2020-04-10T13:12:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.