X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval
Augmentation
- URL: http://arxiv.org/abs/2302.11352v1
- Date: Wed, 22 Feb 2023 12:53:33 GMT
- Title: X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval
Augmentation
- Authors: Tom van Sonsbeek and Marcel Worring
- Abstract summary: We apply multi-modal retrieval augmentation to several tasks in chest X-ray analysis.
Vision and language modalities are aligned using a pre-trained CLIP model.
Non-parametric retrieval index reaches state-of-the-art retrieval levels.
- Score: 14.375693586801338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An important component of human analysis of medical images and their context
is the ability to relate newly seen things to related instances in our memory.
In this paper we mimic this ability by using multi-modal retrieval augmentation
and apply it to several tasks in chest X-ray analysis. By retrieving similar
images and/or radiology reports we expand and regularize the case at hand with
additional knowledge, while maintaining factual knowledge consistency. The
method consists of two components. First, vision and language modalities are
aligned using a pre-trained CLIP model. To enforce that the retrieval focus
will be on detailed disease-related content instead of global visual appearance
it is fine-tuned using disease class information. Subsequently, we construct a
non-parametric retrieval index, which reaches state-of-the-art retrieval
levels. We use this index in our downstream tasks to augment image
representations through multi-head attention for disease classification and
report retrieval. We show that retrieval augmentation gives considerable
improvements on these tasks. Our downstream report retrieval even shows to be
competitive with dedicated report generation methods, paving the path for this
method in medical imaging.
Related papers
- Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports.
It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - MVC: A Multi-Task Vision Transformer Network for COVID-19 Diagnosis from
Chest X-ray Images [10.616065108433798]
We propose a new method, namely Multi-task Vision Transformer (MVC) for simultaneously classifying chest X-ray images and identifying affected regions from the input data.
Our method is built upon the Vision Transformer but extends its learning capability in a multi-task setting.
arXiv Detail & Related papers (2023-09-30T15:52:18Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Learning Better Contrastive View from Radiologist's Gaze [45.55702035003462]
We propose a novel augmentation method, i.e., FocusContrast, to learn from radiologists' gaze in diagnosis and generate contrastive views for medical images.
Specifically, we track the gaze movement of radiologists and model their visual attention when reading to diagnose X-ray images.
As a plug-and-play module, FocusContrast consistently improves state-of-the-art contrastive learning methods of SimCLR, MoCo, and BYOL by 4.07.0% in classification accuracy on a knee X-ray dataset.
arXiv Detail & Related papers (2023-05-15T17:34:49Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning.
In detail, the fundamental structure of our graph is pre-constructed from general knowledge.
Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z) - Representative Image Feature Extraction via Contrastive Learning
Pretraining for Chest X-ray Report Generation [19.69560434388278]
The goal of medical report generation is to accurately capture and describe the image findings.
Previous works pretrain their visual encoding neural networks with large datasets in different domains.
We propose a framework that uses a contrastive learning approach to pretrain the visual encoder and requires no additional meta information.
arXiv Detail & Related papers (2022-09-04T12:07:19Z) - Cross-Modal Contrastive Learning for Abnormality Classification and
Localization in Chest X-rays with Radiomics using a Feedback Loop [63.81818077092879]
We propose an end-to-end semi-supervised cross-modal contrastive learning framework for medical images.
We first apply an image encoder to classify the chest X-rays and to generate the image features.
The radiomic features are then passed through another dedicated encoder to act as the positive sample for the image features generated from the same chest X-ray.
arXiv Detail & Related papers (2021-04-11T09:16:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.