Pre-training on High Definition X-ray Images: An Experimental Study
- URL: http://arxiv.org/abs/2404.17926v1
- Date: Sat, 27 Apr 2024 14:29:53 GMT
- Title: Pre-training on High Definition X-ray Images: An Experimental Study
- Authors: Xiao Wang, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang, Chuanfu Li, Jin Tang,
- Abstract summary: We propose the first high-definition (1280 $times$ 1280) X-ray based pre-trained foundation vision model on a large-scale dataset.
Our model follows the masked auto-encoder framework which takes the tokens after mask processing (with a high rate) is used as input.
We validate the effectiveness of our model on two downstream tasks, including X-ray report generation and disease recognition.
- Score: 19.46094537296955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficult miscellaneous diseases. In this paper, we address these issues by proposing the first high-definition (1280 $\times$ 1280) X-ray based pre-trained foundation vision model on our newly collected large-scale dataset which contains more than 1 million X-ray images. Our model follows the masked auto-encoder framework which takes the tokens after mask processing (with a high rate) is used as input, and the masked image patches are reconstructed by the Transformer encoder-decoder network. More importantly, we introduce a novel context-aware masking strategy that utilizes the chest contour as a boundary for adaptive masking operations. We validate the effectiveness of our model on two downstream tasks, including X-ray report generation and disease recognition. Extensive experiments demonstrate that our pre-trained medical foundation vision model achieves comparable or even new state-of-the-art performance on downstream benchmark datasets. The source code and pre-trained models of this paper will be released on https://github.com/Event-AHU/Medical_Image_Analysis.
Related papers
- CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset [14.911363203907008]
X-ray image-based medical report generation can significantly reduce diagnostic burdens and patient wait times.
We conduct a comprehensive benchmarking of existing mainstream X-ray report generation models and large language models (LLMs) on the CheXpert Plus dataset.
We propose a large model for the X-ray image report generation using a multi-stage pre-training strategy, including self-supervised autoregressive generation and Xray-report contrastive learning.
arXiv Detail & Related papers (2024-10-01T04:07:01Z) - Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification [4.6651139122498]
In medical contexts, the imbalanced data distribution in long-tailed datasets, due to scarce labels for rare diseases, greatly impairs the diagnostic accuracy of deep learning models.
Recent multimodal text-image supervised foundation models offer new solutions to data scarcity through effective representation learning.
We propose a novel Text-guided Foundation model Adaptation for Long-Tailed medical image classification (TFA-LT)
Our method achieves an accuracy improvement of up to 27.1%, highlighting the substantial potential of foundation model adaptation in this area.
arXiv Detail & Related papers (2024-08-27T04:18:18Z) - A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases [0.0]
Vision Transformers (ViT) are powerful tools due to their scalability and ability to process large amounts of data.
We fine-tuned two variants of ViT models, one pre-trained on ImageNet and another trained from scratch, using the NIH Chest X-ray dataset.
Our study evaluates the performance of these models in the multi-label classification of 14 distinct diseases.
arXiv Detail & Related papers (2024-05-31T23:56:42Z) - CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training [6.292642131180376]
In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt.
We also design two contrastive losses, named ICL and TCL, for learning study-level characteristics of medical images and reports.
Our model outperforms the state-of-the-art models trained under the same conditions.
arXiv Detail & Related papers (2023-10-20T05:44:55Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Performance of GAN-based augmentation for deep learning COVID-19 image
classification [57.1795052451257]
The biggest challenge in the application of deep learning to the medical domain is the availability of training data.
Data augmentation is a typical methodology used in machine learning when confronted with a limited data set.
In this work, a StyleGAN2-ADA model of Generative Adversarial Networks is trained on the limited COVID-19 chest X-ray image set.
arXiv Detail & Related papers (2023-04-18T15:39:58Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Learning Invariant Feature Representation to Improve Generalization
across Chest X-ray Datasets [55.06983249986729]
We show that a deep learning model performing well when tested on the same dataset as training data starts to perform poorly when it is tested on a dataset from a different source.
By employing an adversarial training strategy, we show that a network can be forced to learn a source-invariant representation.
arXiv Detail & Related papers (2020-08-04T07:41:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.