Related papers: Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning

Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning

URL: http://arxiv.org/abs/2309.05904v2
Date: Mon, 18 Sep 2023 01:23:52 GMT
Title: Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning
Authors: Weijian Huang and Cheng Li and Hao Yang and Jiarun Liu and Shanshan Wang
Abstract summary: MaCo is a novel multi-modal medical foundation model that explores masked contrastive learning to achieve granular alignment and zero-shot learning for a variety of medical imaging tasks. We evaluate MaCo on six well-known open-source X-ray datasets, and the experimental results show it outperforms seven state-of-the-art approaches for classification, segmentation, and zero-shot phase grounding.
Score: 8.717599327516822
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face a number of challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and capability of utilizing very limited or no task-specific labeled data in real-world clinical applications. In this study, we present MaCo, a novel multi-modal medical foundation model that explores masked contrastive learning to achieve granular alignment and zero-shot learning for a variety of medical imaging tasks. MaCo incorporates a correlation weighting mechanism to adjust the correlation between masked image patches and their corresponding reports, thereby enhancing the representation learning capabilities. We evaluate MaCo on six well-known open-source X-ray datasets, and the experimental results show it outperforms seven state-of-the-art approaches for classification, segmentation, and zero-shot phase grounding, demonstrating its great potential to promote a wide range of medical image analysis tasks.

Related papers

Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings [44.99833362998488]
This paper introduces CXR-TextInter, a novel framework that repurposes powerful text-centric language models for chest X-rays interpretation.<n>We augment this LLM-centric approach with an integrated medical knowledge module to enhance clinical reasoning.<n>Our work validates an alternative paradigm for medical image AI, showcasing the potential of harnessing advanced LLM capabilities.
arXiv Detail & Related papers (2025-05-03T06:18:12Z)
A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography [1.2289361708127877]
Multi-Stage Adaptive Vision-Language Tuning (MAViLT) is a novel framework designed to enhance multimodal reasoning and generation for vision-based understanding. MAViLT incorporates a clinical gradient-weighted tokenization process and a hierarchical fine-tuning strategy, enabling it to generate accurate radiology reports, synthesize realistic CXRs from text, and answer vision-based clinical questions. We evaluate MAViLT on two benchmark datasets, MIMIC-CXR and Indiana University CXR, achieving state-of-the-art results across all tasks.
arXiv Detail & Related papers (2025-02-09T15:02:57Z)
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data. This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
Towards a vision foundation model for comprehensive assessment of Cardiac MRI [11.838157772803282]
We introduce a vision foundation model trained for cardiac magnetic resonance imaging (CMR) assessment. We finetune the model in supervised way for 9 clinical tasks typical to a CMR workflow. We demonstrate improved accuracy and robustness across all tasks, over a range of available labeled dataset sizes.
arXiv Detail & Related papers (2024-10-02T15:32:01Z)
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports. Based on this dataset, we focus on the challanging task of unsupervised pretraining. We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z)
COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images [3.5418498524791766]
This research is development of a novel counterfactual inpainting approach (COIN) COIN flips the predicted classification label from abnormal to normal by using a generative model. The effectiveness of the method is demonstrated by segmenting synthetic targets and actual kidney tumors from CT images acquired from Tartu University Hospital in Estonia.
arXiv Detail & Related papers (2024-04-19T12:09:49Z)
Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability [2.64700310378485]
We introduce a novel deep-learning framework for joint disease diagnosis and prediction of corresponding visual saliency maps for chest X-ray scans. Specifically, we designed a novel dual-encoder multi-task UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder. Experiments show that our proposed method outperformed existing techniques for chest X-ray diagnosis and the quality of visual saliency map prediction.
arXiv Detail & Related papers (2024-03-25T17:31:12Z)
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z)
MLVICX: Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning [6.4136876268620115]
MLVICX is an approach to capture rich representations in the form of embeddings from chest X-ray images. We demonstrate the performance of MLVICX in advancing self-supervised chest X-ray representation learning.
arXiv Detail & Related papers (2024-03-18T06:19:37Z)
MUSCLE: Multi-task Self-supervised Continual Learning to Pre-train Deep Models for X-ray Images of Multiple Body Parts [63.30352394004674]
Multi-task Self-super-vised Continual Learning (MUSCLE) is a novel self-supervised pre-training pipeline for medical imaging tasks. MUSCLE aggregates X-rays collected from multiple body parts for representation learning, and adopts a well-designed continual learning procedure. We evaluate MUSCLE using 9 real-world X-ray datasets with various tasks, including pneumonia classification, skeletal abnormality classification, lung segmentation, and tuberculosis (TB) detection.
arXiv Detail & Related papers (2023-10-03T12:19:19Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Explainable Deep Learning Methods in Medical Image Classification: A Survey [0.0]
State-of-the-art deep learning models have achieved human-level accuracy on the classification of different types of medical data. These models are hardly adopted in clinical, mainly due to their lack of interpretability. The black-box-ness of deep learning models has raised the need for devising strategies to explain the decision process of these models.
arXiv Detail & Related papers (2022-05-10T09:28:14Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Weakly supervised multiple instance learning histopathological tumor segmentation [51.085268272912415]
We propose a weakly supervised framework for whole slide imaging segmentation. We exploit a multiple instance learning scheme for training models. The proposed framework has been evaluated on multi-locations and multi-centric public data from The Cancer Genome Atlas and the PatchCamelyon dataset.
arXiv Detail & Related papers (2020-04-10T13:12:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.