Deep Mining External Imperfect Data for Chest X-ray Disease Screening
- URL: http://arxiv.org/abs/2006.03796v1
- Date: Sat, 6 Jun 2020 06:48:40 GMT
- Title: Deep Mining External Imperfect Data for Chest X-ray Disease Screening
- Authors: Luyang Luo, Lequan Yu, Hao Chen, Quande Liu, Xi Wang, Jiaqi Xu, and
Pheng-Ann Heng
- Abstract summary: We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges.
We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories.
Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
- Score: 57.40329813850719
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning approaches have demonstrated remarkable progress in automatic
Chest X-ray analysis. The data-driven feature of deep models requires training
data to cover a large distribution. Therefore, it is substantial to integrate
knowledge from multiple datasets, especially for medical images. However,
learning a disease classification model with extra Chest X-ray (CXR) data is
yet challenging. Recent researches have demonstrated that performance
bottleneck exists in joint training on different CXR datasets, and few made
efforts to address the obstacle. In this paper, we argue that incorporating an
external CXR dataset leads to imperfect training data, which raises the
challenges. Specifically, the imperfect data is in two folds: domain
discrepancy, as the image appearances vary across datasets; and label
discrepancy, as different datasets are partially labeled. To this end, we
formulate the multi-label thoracic disease classification problem as weighted
independent binary tasks according to the categories. For common categories
shared across domains, we adopt task-specific adversarial training to alleviate
the feature differences. For categories existing in a single dataset, we
present uncertainty-aware temporal ensembling of model predictions to mine the
information from the missing labels further. In this way, our framework
simultaneously models and tackles the domain and label discrepancies, enabling
superior knowledge mining ability. We conduct extensive experiments on three
datasets with more than 360,000 Chest X-ray images. Our method outperforms
other competing models and sets state-of-the-art performance on the official
NIH test set with 0.8349 AUC, demonstrating its effectiveness of utilizing the
external dataset to improve the internal classification.
Related papers
- Semantically Redundant Training Data Removal and Deep Model
Classification Performance: A Study with Chest X-rays [5.454938535500864]
We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data.
We demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set.
arXiv Detail & Related papers (2023-09-18T13:56:34Z) - How Can We Tame the Long-Tail of Chest X-ray Datasets? [0.0]
Chest X-rays (CXRs) are a medical imaging modality that is used to infer a large number of abnormalities.
Few of them are quite commonly observed and are abundantly represented in CXR datasets.
It is challenging for current models to learn independent discriminatory features for labels that are rare but may be of high significance.
arXiv Detail & Related papers (2023-09-08T12:28:40Z) - When More is Less: Incorporating Additional Datasets Can Hurt
Performance By Introducing Spurious Correlations [16.782625445546273]
We demonstrate that in 43% of settings, a model trained on data from two hospitals has poorer worst group accuracy over both hospitals than a model trained on just a single hospital's data.
We explain that this phenomenon arises from the spurious correlation that emerges between the disease and hospital, due to hospital-specific image artifacts.
arXiv Detail & Related papers (2023-08-08T17:58:45Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Learn to Ignore: Domain Adaptation for Multi-Site MRI Analysis [1.3079444139643956]
We present a novel method that learns to ignore the scanner-related features present in the images, while learning features relevant for the classification task.
Our method outperforms state-of-the-art domain adaptation methods on a classification task between Multiple Sclerosis patients and healthy subjects.
arXiv Detail & Related papers (2021-10-13T15:40:50Z) - The pitfalls of using open data to develop deep learning solutions for
COVID-19 detection in chest X-rays [64.02097860085202]
Deep learning models have been developed to identify COVID-19 from chest X-rays.
Results have been exceptional when training and testing on open-source data.
Data analysis and model evaluations show that the popular open-source dataset COVIDx is not representative of the real clinical problem.
arXiv Detail & Related papers (2021-09-14T10:59:11Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z) - Learning Invariant Feature Representation to Improve Generalization
across Chest X-ray Datasets [55.06983249986729]
We show that a deep learning model performing well when tested on the same dataset as training data starts to perform poorly when it is tested on a dataset from a different source.
By employing an adversarial training strategy, we show that a network can be forced to learn a source-invariant representation.
arXiv Detail & Related papers (2020-08-04T07:41:15Z) - Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and
Other Tasks [0.1160208922584163]
We train a convolutional neural network (CNN) with the largest multi-source, functional MRI (fMRI) connectomic dataset ever compiled.
Our study finds that deep learning models that distinguish ASD from TD controls focus broadly on temporal and cerebellar connections.
arXiv Detail & Related papers (2020-02-14T17:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.