CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training
- URL: http://arxiv.org/abs/2310.13292v1
- Date: Fri, 20 Oct 2023 05:44:55 GMT
- Title: CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training
- Authors: Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung
Hong, Woonhyunk Baek, Byungseok Roh
- Abstract summary: In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt.
We also design two contrastive losses, named ICL and TCL, for learning study-level characteristics of medical images and reports.
Our model outperforms the state-of-the-art models trained under the same conditions.
- Score: 6.292642131180376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A large-scale image-text pair dataset has greatly contributed to the
development of vision-language pre-training (VLP) models, which enable
zero-shot or few-shot classification without costly annotation. However, in the
medical domain, the scarcity of data remains a significant challenge for
developing a powerful VLP model. In this paper, we tackle the lack of
image-text data in chest X-ray by expanding image-label pair as image-text pair
via general prompt and utilizing multiple images and multiple sections in a
radiologic report. We also design two contrastive losses, named ICL and TCL,
for learning study-level characteristics of medical images and reports,
respectively. Our model outperforms the state-of-the-art models trained under
the same conditions. Also, enlarged dataset improve the discriminative power of
our pre-trained model for classification, while sacrificing marginal retrieval
performance. Code is available at https://github.com/kakaobrain/cxr-clip.
Related papers
- XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training [29.02600107837688]
Vision-and-language pretraining uses contrastive learning on image-text pairs to achieve effective transfer across tasks.
Current models struggle to accurately reconstruct key pathological features due to the scarcity of medical data.
This paper proposes a XLIP (Masked modelling for medical Language-Image Pre-training) framework to enhance pathological learning and feature learning via unpaired data.
arXiv Detail & Related papers (2024-07-28T17:38:21Z) - Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports.
It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z) - Performance of GAN-based augmentation for deep learning COVID-19 image
classification [57.1795052451257]
The biggest challenge in the application of deep learning to the medical domain is the availability of training data.
Data augmentation is a typical methodology used in machine learning when confronted with a limited data set.
In this work, a StyleGAN2-ADA model of Generative Adversarial Networks is trained on the limited COVID-19 chest X-ray image set.
arXiv Detail & Related papers (2023-04-18T15:39:58Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Self-Supervised Curricular Deep Learning for Chest X-Ray Image
Classification [1.6631602844999727]
Self-Supervised Learning pretraining outperforms models trained from scratch or pretrained on ImageNet.
Top-performing SSLpretrained models show a higher degree of attention in the lung regions.
arXiv Detail & Related papers (2023-01-25T16:45:13Z) - Generative Negative Text Replay for Continual Vision-Language
Pretraining [95.2784858069843]
Vision-language pre-training has attracted increasing attention recently.
Massive data are usually collected in a streaming fashion.
We propose a multi-modal knowledge distillation between images and texts to align the instance-wise prediction between old and new models.
arXiv Detail & Related papers (2022-10-31T13:42:21Z) - RadTex: Learning Efficient Radiograph Representations from Text Reports [7.090896766922791]
We build a data-efficient learning framework that utilizes radiology reports to improve medical image classification performance with limited labeled data.
Our model achieves higher classification performance than ImageNet-supervised pretraining when labeled training data is limited.
arXiv Detail & Related papers (2022-08-05T15:06:26Z) - Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays [10.398175542736285]
We introduce an image-text pre-training framework that can learn from mixed data inputs.
We demonstrate the feasibility of pre-training across mixed data inputs.
We also illustrate the benefits of adopting such pre-trained models in 3 chest X-ray applications.
arXiv Detail & Related papers (2021-03-30T01:48:46Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.