BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease
Diagnosis
- URL: http://arxiv.org/abs/2108.04938v1
- Date: Tue, 10 Aug 2021 21:51:25 GMT
- Title: BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease
Diagnosis
- Authors: Masoud Monajatipoor, Mozhdeh Rouhsedaghat, Liunian Harold Li, Aichi
Chien, C.-C. Jay Kuo, Fabien Scalzo, and Kai-Wei Chang
- Abstract summary: Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them.
BERTHop is a transformer-based model based on PixelHop++ and VisualBERT, for better capturing the associations between the two modalities.
- Score: 42.917164607812886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-and-language(V&L) models take image and text as input and learn to
capture the associations between them. Prior studies show that pre-trained V&L
models can significantly improve the model performance for downstream tasks
such as Visual Question Answering (VQA). However, V&L models are less effective
when applied in the medical domain (e.g., on X-ray images and clinical notes)
due to the domain gap. In this paper, we investigate the challenges of applying
pre-trained V&L models in medical applications. In particular, we identify that
the visual representation in general V&L models is not suitable for processing
medical data. To overcome this limitation, we propose BERTHop, a
transformer-based model based on PixelHop++ and VisualBERT, for better
capturing the associations between the two modalities. Experiments on the OpenI
dataset, a commonly used thoracic disease diagnosis benchmark, show that
BERTHop achieves an average Area Under the Curve (AUC) of 98.12% which is 1.62%
higher than state-of-the-art (SOTA) while it is trained on a 9 times smaller
dataset.
Related papers
- LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training.
LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions.
Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z) - MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis [1.2903829793534272]
Chest X-ray images are commonly used for predicting acute and chronic cardiopulmonary conditions.
Efforts to integrate them with structured clinical data face challenges due to incomplete electronic health records.
This paper introduces MedPromptX, the first model to integrate multimodal large language models (MLLMs), few-shot prompting (FP) and visual grounding (VG)
Results demonstrate the SOTA performance of MedPromptX, achieving an 11% improvement in F1-score compared to the baselines.
arXiv Detail & Related papers (2024-03-22T19:19:51Z) - Application Of Vision-Language Models For Assessing Osteoarthritis
Disease Severity [0.43431539537721414]
Osteoarthritis (OA) poses a global health challenge, demanding precise diagnostic methods.
Existing deep learning models for OA assessment are unimodal single task systems.
This study investigates employing Vision Language Processing models to predict OA severity using Xray images and corresponding reports.
arXiv Detail & Related papers (2024-01-12T02:43:58Z) - CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training [6.292642131180376]
In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt.
We also design two contrastive losses, named ICL and TCL, for learning study-level characteristics of medical images and reports.
Our model outperforms the state-of-the-art models trained under the same conditions.
arXiv Detail & Related papers (2023-10-20T05:44:55Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.