ReXGradient-160K: A Large-Scale Publicly Available Dataset of Chest Radiographs with Free-text Reports
- URL: http://arxiv.org/abs/2505.00228v1
- Date: Thu, 01 May 2025 00:29:50 GMT
- Title: ReXGradient-160K: A Large-Scale Publicly Available Dataset of Chest Radiographs with Free-text Reports
- Authors: Xiaoman Zhang, Julián N. Acosta, Josh Miller, Ouwen Huang, Pranav Rajpurkar,
- Abstract summary: This dataset contains 160,000 chest X-ray studies with paired radiological reports from 109,487 unique patients across 3 U.S. health systems.<n>By providing this extensive dataset, we aim to accelerate research in medical imaging AI and advance the state-of-the-art in automated radiological analysis.
- Score: 4.247428746963443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present ReXGradient-160K, representing the largest publicly available chest X-ray dataset to date in terms of the number of patients. This dataset contains 160,000 chest X-ray studies with paired radiological reports from 109,487 unique patients across 3 U.S. health systems (79 medical sites). This comprehensive dataset includes multiple images per study and detailed radiology reports, making it particularly valuable for the development and evaluation of AI systems for medical imaging and automated report generation models. The dataset is divided into training (140,000 studies), validation (10,000 studies), and public test (10,000 studies) sets, with an additional private test set (10,000 studies) reserved for model evaluation on the ReXrank benchmark. By providing this extensive dataset, we aim to accelerate research in medical imaging AI and advance the state-of-the-art in automated radiological analysis. Our dataset will be open-sourced at https://huggingface.co/datasets/rajpurkarlab/ReXGradient-160K.
Related papers
- ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation [16.687723916901728]
We present ReXrank, a leaderboard and challenge for assessing AI-powered radiology report generation.
Our framework incorporates ReXGradient, the largest test dataset consisting of 10,000 studies.
By providing this standardized evaluation framework, ReXrank enables meaningful comparisons of model performance.
arXiv Detail & Related papers (2024-11-22T18:40:02Z) - CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats [18.498344743909254]
CheXpert Plus is the largest text dataset publicly released in radiology.
It represents the largest text de-identification effort in radiology.
All reports are paired with high-quality images in DICOM format.
arXiv Detail & Related papers (2024-05-29T21:48:56Z) - Revisiting Computer-Aided Tuberculosis Diagnosis [56.80999479735375]
Tuberculosis (TB) is a major global health threat, causing millions of deaths annually.
Computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data.
We establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas.
This dataset enables the training of sophisticated detectors for high-quality CTD.
arXiv Detail & Related papers (2023-07-06T08:27:48Z) - Computer-aided Tuberculosis Diagnosis with Attribute Reasoning
Assistance [58.01014026139231]
We propose a new large-scale tuberculosis (TB) chest X-ray dataset (TBX-Att)
We establish an attribute-assisted weakly-supervised framework to classify and localize TB by leveraging the attribute information.
The proposed model is evaluated on the TBX-Att dataset and will serve as a solid baseline for future research.
arXiv Detail & Related papers (2022-07-01T07:50:35Z) - PediCXR: An open, large-scale chest radiograph dataset for
interpretation of common thoracic diseases in children [0.31317409221921133]
We release PediCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021.
The dataset was labeled for the presence of 36 critical findings and 15 diseases.
arXiv Detail & Related papers (2022-03-20T18:03:11Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - The pitfalls of using open data to develop deep learning solutions for
COVID-19 detection in chest X-rays [64.02097860085202]
Deep learning models have been developed to identify COVID-19 from chest X-rays.
Results have been exceptional when training and testing on open-source data.
Data analysis and model evaluations show that the popular open-source dataset COVIDx is not representative of the real clinical problem.
arXiv Detail & Related papers (2021-09-14T10:59:11Z) - Is Medical Chest X-ray Data Anonymous? [8.29994774042507]
We show that a well-trained deep learning system is able to recover the patient identity from chest X-ray data.
We demonstrate this using the publicly available large-scale ChestX-ray14 dataset.
arXiv Detail & Related papers (2021-03-15T17:26:43Z) - Creation and Validation of a Chest X-Ray Dataset with Eye-tracking and
Report Dictation for AI Development [47.1152650685625]
We developed a rich dataset of Chest X-Ray (CXR) images to assist investigators in artificial intelligence.
The data were collected using an eye tracking system while a radiologist reviewed and reported on 1,083 CXR images.
arXiv Detail & Related papers (2020-09-15T23:12:49Z) - Learning Invariant Feature Representation to Improve Generalization
across Chest X-ray Datasets [55.06983249986729]
We show that a deep learning model performing well when tested on the same dataset as training data starts to perform poorly when it is tested on a dataset from a different source.
By employing an adversarial training strategy, we show that a network can be forced to learn a source-invariant representation.
arXiv Detail & Related papers (2020-08-04T07:41:15Z) - Automated Radiological Report Generation For Chest X-Rays With
Weakly-Supervised End-to-End Deep Learning [17.315387269810426]
We built a database containing more than 12,000 CXR scans and radiological reports.
We developed a model based on deep convolutional neural network and recurrent network with attention mechanism.
The model provides automated recognition of given scans and generation of reports.
arXiv Detail & Related papers (2020-06-18T08:12:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.