Did You Get What You Paid For? Rethinking Annotation Cost of Deep
Learning Based Computer Aided Detection in Chest Radiographs
- URL: http://arxiv.org/abs/2209.15314v1
- Date: Fri, 30 Sep 2022 08:42:22 GMT
- Title: Did You Get What You Paid For? Rethinking Annotation Cost of Deep
Learning Based Computer Aided Detection in Chest Radiographs
- Authors: Tae Soo Kim, Geonwoon Jang, Sanghyup Lee, Thijs Kooi
- Abstract summary: We investigate how the cost of data annotation ultimately impacts the Computer Aided Detection model performance.
We find that cost-efficient annotations provide great value when collected in large amounts and lead to competitive performance when compared to models trained with only gold-standard annotations.
- Score: 8.079269139747131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As deep networks require large amounts of accurately labeled training data, a
strategy to collect sufficiently large and accurate annotations is as important
as innovations in recognition methods. This is especially true for building
Computer Aided Detection (CAD) systems for chest X-rays where domain expertise
of radiologists is required to annotate the presence and location of
abnormalities on X-ray images. However, there lacks concrete evidence that
provides guidance on how much resource to allocate for data annotation such
that the resulting CAD system reaches desired performance. Without this
knowledge, practitioners often fall back to the strategy of collecting as much
detail as possible on as much data as possible which is cost inefficient. In
this work, we investigate how the cost of data annotation ultimately impacts
the CAD model performance on classification and segmentation of chest
abnormalities in frontal-view X-ray images. We define the cost of annotation
with respect to the following three dimensions: quantity, quality and
granularity of labels. Throughout this study, we isolate the impact of each
dimension on the resulting CAD model performance on detecting 10 chest
abnormalities in X-rays. On a large scale training data with over 120K X-ray
images with gold-standard annotations, we find that cost-efficient annotations
provide great value when collected in large amounts and lead to competitive
performance when compared to models trained with only gold-standard
annotations. We also find that combining large amounts of cost efficient
annotations with only small amounts of expensive labels leads to competitive
CAD models at a much lower cost.
Related papers
- Augmenting Chest X-ray Datasets with Non-Expert Annotations [1.9991771189143435]
A popular and cost-effective approach is automated annotation extraction from free-text medical reports.
We enhance two publicly available chest X-ray datasets by incorporating non-expert annotations.
We train a chest drain detector with the non-expert annotations that generalizes well to expert labels.
arXiv Detail & Related papers (2023-09-05T13:52:43Z) - How Does Pruning Impact Long-Tailed Multi-Label Medical Image
Classifiers? [49.35105290167996]
Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance.
This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification.
arXiv Detail & Related papers (2023-08-17T20:40:30Z) - RadTex: Learning Efficient Radiograph Representations from Text Reports [7.090896766922791]
We build a data-efficient learning framework that utilizes radiology reports to improve medical image classification performance with limited labeled data.
Our model achieves higher classification performance than ImageNet-supervised pretraining when labeled training data is limited.
arXiv Detail & Related papers (2022-08-05T15:06:26Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Chest ImaGenome Dataset for Clinical Reasoning [5.906670720220545]
We provide the first Chest ImaGenome dataset with a scene graph data structure to describe $242,072$ images.
Local annotations are automatically produced using a joint rule-based natural language processing (NLP) and atlas-based bounding box detection pipeline.
arXiv Detail & Related papers (2021-07-31T20:10:30Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Learning Invariant Feature Representation to Improve Generalization
across Chest X-ray Datasets [55.06983249986729]
We show that a deep learning model performing well when tested on the same dataset as training data starts to perform poorly when it is tested on a dataset from a different source.
By employing an adversarial training strategy, we show that a network can be forced to learn a source-invariant representation.
arXiv Detail & Related papers (2020-08-04T07:41:15Z) - Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges.
We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories.
Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z) - Localization of Critical Findings in Chest X-Ray without Local
Annotations Using Multi-Instance Learning [0.0]
deep learning models commonly suffer from a lack of explainability.
Deep learning models require locally annotated training data in form of pixel level labels or bounding box coordinates.
In this work, we address these shortcomings with an interpretable DL algorithm based on multi-instance learning.
arXiv Detail & Related papers (2020-01-23T21:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.