Weakly supervised one-stage vision and language disease detection using
large scale pneumonia and pneumothorax studies
- URL: http://arxiv.org/abs/2007.15778v1
- Date: Fri, 31 Jul 2020 00:04:14 GMT
- Title: Weakly supervised one-stage vision and language disease detection using
large scale pneumonia and pneumothorax studies
- Authors: Leo K. Tam, Xiaosong Wang, Evrim Turkbey, Kevin Lu, Yuhong Wen, and
Daguang Xu
- Abstract summary: We present a new set of radiologist paired bounding box and natural language annotations on the publicly available MIMIC-CXR dataset.
We also present a joint vision language weakly supervised transformer layer-selected one-stage dual head detection architecture (LITERATI)
The architectural modifications address three obstacles -- implementing a supervised vision and language detection method in a weakly supervised fashion, incorporating clinical referring expression natural language information, and generating high fidelity detections with map probabilities.
- Score: 9.34633748515622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting clinically relevant objects in medical images is a challenge
despite large datasets due to the lack of detailed labels. To address the label
issue, we utilize the scene-level labels with a detection architecture that
incorporates natural language information. We present a challenging new set of
radiologist paired bounding box and natural language annotations on the
publicly available MIMIC-CXR dataset especially focussed on pneumonia and
pneumothorax. Along with the dataset, we present a joint vision language weakly
supervised transformer layer-selected one-stage dual head detection
architecture (LITERATI) alongside strong baseline comparisons with class
activation mapping (CAM), gradient CAM, and relevant implementations on the NIH
ChestXray-14 and MIMIC-CXR dataset. Borrowing from advances in vision language
architectures, the LITERATI method demonstrates joint image and referring
expression (objects localized in the image using natural language) input for
detection that scales in a purely weakly supervised fashion. The architectural
modifications address three obstacles -- implementing a supervised vision and
language detection method in a weakly supervised fashion, incorporating
clinical referring expression natural language information, and generating high
fidelity detections with map probabilities. Nevertheless, the challenging
clinical nature of the radiologist annotations including subtle references,
multi-instance specifications, and relatively verbose underlying medical
reports, ensures the vision language detection task at scale remains
stimulating for future investigation.
Related papers
- ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - SGSeg: Enabling Text-free Inference in Language-guided Segmentation of Chest X-rays via Self-guidance [10.075820470715374]
We propose a self-guided segmentation framework (SGSeg) that leverages language guidance for training (multi-modal) while enabling text-free inference (uni-modal)
We exploit the critical location information of both pulmonary and pathological structures depicted in the text reports and introduce a novel localization-enhanced report generation (LERG) module to generate clinical reports for self-guidance.
Our LERG integrates an object detector and a location-based attention aggregator, weakly-supervised by a location-aware pseudo-label extraction module.
arXiv Detail & Related papers (2024-09-07T08:16:00Z) - CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting [0.0]
We evaluate the publicly available, state of the art, foundational vision-language models for chest X-ray interpretation.
We find that vision-language models often hallucinate with confident language, which slows down clinical interpretation.
We develop an agent-based vision-language approach for report generation using CheXagent's linear probes and BioViL-T's phrase grounding tools.
arXiv Detail & Related papers (2024-07-11T18:39:19Z) - Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports.
It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z) - Exploring scalable medical image encoders beyond text supervision [42.86944965225041]
Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images.
We introduce RAD-DINO, a biomedical image encoder pre-trained solely on unimodal biomedical imaging data.
arXiv Detail & Related papers (2024-01-19T17:02:17Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - Multi-Granularity Cross-modal Alignment for Generalized Medical Visual
Representation Learning [24.215619918283462]
We present a novel framework for learning medical visual representations directly from paired radiology reports.
Our framework harnesses the naturally exhibited semantic correspondences between medical image and radiology reports at three different levels.
arXiv Detail & Related papers (2022-10-12T09:31:39Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Breaking with Fixed Set Pathology Recognition through Report-Guided
Contrastive Training [23.506879497561712]
We employ a contrastive global-local dual-encoder architecture to learn concepts directly from unstructured medical reports.
We evaluate our approach on the large-scale chest X-Ray datasets MIMIC-CXR, CheXpert, and ChestX-Ray14 for disease classification.
arXiv Detail & Related papers (2022-05-14T21:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.