Related papers: How Can We Tame the Long-Tail of Chest X-ray Datasets?

How Can We Tame the Long-Tail of Chest X-ray Datasets?

URL: http://arxiv.org/abs/2309.04293v1
Date: Fri, 8 Sep 2023 12:28:40 GMT
Title: How Can We Tame the Long-Tail of Chest X-ray Datasets?
Authors: Arsh Verma
Abstract summary: Chest X-rays (CXRs) are a medical imaging modality that is used to infer a large number of abnormalities. Few of them are quite commonly observed and are abundantly represented in CXR datasets. It is challenging for current models to learn independent discriminatory features for labels that are rare but may be of high significance.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Chest X-rays (CXRs) are a medical imaging modality that is used to infer a large number of abnormalities. While it is hard to define an exhaustive list of these abnormalities, which may co-occur on a chest X-ray, few of them are quite commonly observed and are abundantly represented in CXR datasets used to train deep learning models for automated inference. However, it is challenging for current models to learn independent discriminatory features for labels that are rare but may be of high significance. Prior works focus on the combination of multi-label and long tail problems by introducing novel loss functions or some mechanism of re-sampling or re-weighting the data. Instead, we propose that it is possible to achieve significant performance gains merely by choosing an initialization for a model that is closer to the domain of the target dataset. This method can complement the techniques proposed in existing literature, and can easily be scaled to new labels. Finally, we also examine the veracity of synthetically generated data to augment the tail labels and analyse its contribution to improving model performance.

Related papers

Confident Pseudo-labeled Diffusion Augmentation for Canine Cardiomegaly Detection [7.9471205712560264]
Canine cardiomegaly, marked by an enlarged heart, poses serious health risks if undetected. Current detection models often rely on small, poorly annotated datasets. We propose a Confident Pseudo-labeled Diffusion Augmentation model for identifying canine cardiomegaly.
arXiv Detail & Related papers (2025-01-13T18:10:19Z)
Synthetic Augmentation with Large-scale Unconditional Pre-training [4.162192894410251]
We propose a synthetic augmentation method called HistoDiffusion to reduce the dependency on annotated data. HistoDiffusion can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training. We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets.
arXiv Detail & Related papers (2023-08-08T03:34:04Z)
Automated Labeling of German Chest X-Ray Radiology Reports using Deep Learning [50.591267188664666]
We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model. Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks.
arXiv Detail & Related papers (2023-06-09T16:08:35Z)
X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning. To take the power of both worlds, we propose a novel X-model. X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z)
Covid-19 Detection from Chest X-ray and Patient Metadata using Graph Convolutional Neural Networks [6.420262246029286]
We propose a novel Graph Convolution Neural Network (GCN) that is capable of identifying bio-markers of Covid-19 pneumonia. The proposed method exploits important relational knowledge between data instances and their features using graph representation and applies convolution to learn the graph data.
arXiv Detail & Related papers (2021-05-20T13:13:29Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Learning Invariant Feature Representation to Improve Generalization across Chest X-ray Datasets [55.06983249986729]
We show that a deep learning model performing well when tested on the same dataset as training data starts to perform poorly when it is tested on a dataset from a different source. By employing an adversarial training strategy, we show that a network can be forced to learn a source-invariant representation.
arXiv Detail & Related papers (2020-08-04T07:41:15Z)
Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges. We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories. Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z)
Self-Training with Improved Regularization for Sample-Efficient Chest X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios. Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.