Self-supervised Multi-modal Training from Uncurated Image and Reports
Enables Zero-shot Oversight Artificial Intelligence in Radiology
- URL: http://arxiv.org/abs/2208.05140v4
- Date: Wed, 12 Apr 2023 10:58:04 GMT
- Title: Self-supervised Multi-modal Training from Uncurated Image and Reports
Enables Zero-shot Oversight Artificial Intelligence in Radiology
- Authors: Sangjoon Park, Eun Sun Lee, Kyung Sook Shin, Jeong Eun Lee, and Jong
Chul Ye
- Abstract summary: We present a model dubbed Medical Cross-attention Vision-Language model (Medical X-VL)
Our model enables various zero-shot tasks for oversight AI, ranging from the zero-shot classification to zero-shot error correction.
Our method was especially successful in the data-limited setting, suggesting the potential widespread applicability in medical domain.
- Score: 31.045221580446963
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Oversight AI is an emerging concept in radiology where the AI forms a
symbiosis with radiologists by continuously supporting radiologists in their
decision-making. Recent advances in vision-language models sheds a light on the
long-standing problems of the oversight AI by the understanding both visual and
textual concepts and their semantic correspondences. However, there have been
limited successes in the application of vision-language models in the medical
domain, as the current vision-language models and learning strategies for
photographic images and captions call for the web-scale data corpus of image
and text pairs which was not often feasible in the medical domain. To address
this, here we present a model dubbed Medical Cross-attention Vision-Language
model (Medical X-VL), leveraging the key components to be tailored for the
medical domain. Our medical X-VL model is based on the following components:
self-supervised uni-modal models in medical domain and fusion encoder to bridge
them, momentum distillation, sentence-wise contrastive learning for medical
reports, and the sentence similarity-adjusted hard negative mining. We
experimentally demonstrated that our model enables various zero-shot tasks for
oversight AI, ranging from the zero-shot classification to zero-shot error
correction. Our model outperformed the current state-of-the-art models in two
different medical image database, suggesting the novel clinical usage of our
oversight AI model for monitoring human errors. Our method was especially
successful in the data-limited setting, which is frequently encountered in the
clinics, suggesting the potential widespread applicability in medical domain.
Related papers
- The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology [0.0]
Social problems stemming from the shortage of radiologists are intensifying, and artificial intelligence is being highlighted as a potential solution.
Recently emerging large-scale generative AI has expanded from large language models (LLMs) to multi-modal models.
This scoping review systematically organizes existing literature on the clinical value of large-scale generative AI applications.
arXiv Detail & Related papers (2024-09-03T00:48:50Z) - Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning [33.9544297423474]
We present RayDINO, a large visual encoder trained by self-supervision on 873k chest X-rays.
We compare RayDINO to previous state-of-the-art models across nine radiology tasks, from classification and dense segmentation to text generation.
Our findings suggest that self-supervision allows patient-centric AI proving useful in clinical and interpreting X-rays holistically.
arXiv Detail & Related papers (2024-05-02T16:59:10Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text
Supervision [44.00149519249467]
Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images.
We introduce RAD-DINO, a biomedical image encoder pre-trained solely on unimodal biomedical imaging data.
arXiv Detail & Related papers (2024-01-19T17:02:17Z) - Application Of Vision-Language Models For Assessing Osteoarthritis
Disease Severity [0.43431539537721414]
Osteoarthritis (OA) poses a global health challenge, demanding precise diagnostic methods.
Existing deep learning models for OA assessment are unimodal single task systems.
This study investigates employing Vision Language Processing models to predict OA severity using Xray images and corresponding reports.
arXiv Detail & Related papers (2024-01-12T02:43:58Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Adapting Pretrained Vision-Language Foundational Models to Medical
Imaging Domains [3.8137985834223502]
Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets.
We explore the sub-components of the Stable Diffusion pipeline to fine-tune the model to generate medical images.
Our best-performing model improves upon the stable diffusion baseline and can be conditioned to insert a realistic-looking abnormality on a synthetic radiology image.
arXiv Detail & Related papers (2022-10-09T01:43:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.