What limits performance of weakly supervised deep learning for chest CT
classification?
- URL: http://arxiv.org/abs/2402.04419v1
- Date: Tue, 6 Feb 2024 21:38:29 GMT
- Title: What limits performance of weakly supervised deep learning for chest CT
classification?
- Authors: Fakrul Islam Tushar, Vincent M. D'Anniballe, Geoffrey D. Rubin, Joseph
Y. Lo
- Abstract summary: Weakly supervised learning with noisy data has drawn attention in the medical imaging community due to the sparsity of high-quality disease labels.
In this paper, we test the effects of such weak supervision by examining model tolerance for noisy data.
Results demonstrated that the model could endure up to 10% added label error before experiencing a decline in disease classification performance.
- Score: 0.44241702149260353
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Weakly supervised learning with noisy data has drawn attention in the medical
imaging community due to the sparsity of high-quality disease labels. However,
little is known about the limitations of such weakly supervised learning and
the effect of these constraints on disease classification performance. In this
paper, we test the effects of such weak supervision by examining model
tolerance for three conditions. First, we examined model tolerance for noisy
data by incrementally increasing error in the labels within the training data.
Second, we assessed the impact of dataset size by varying the amount of
training data. Third, we compared performance differences between binary and
multi-label classification. Results demonstrated that the model could endure up
to 10% added label error before experiencing a decline in disease
classification performance. Disease classification performance steadily rose as
the amount of training data was increased for all disease classes, before
experiencing a plateau in performance at 75% of training data. Last, the binary
model outperformed the multilabel model in every disease category. However,
such interpretations may be misleading, as the binary model was heavily
influenced by co-occurring diseases and may not have learned the specific
features of the disease in the image. In conclusion, this study may help the
medical imaging community understand the benefits and risks of weak supervision
with noisy labels. Such studies demonstrate the need to build diverse,
large-scale datasets and to develop explainable and responsible AI.
Related papers
- An analysis of data variation and bias in image-based dermatological datasets for machine learning classification [2.039829968340841]
In clinical dermatology, classification models can detect malignant lesions on patients' skin using only RGB images as input.
Most learning-based methods employ data acquired from dermoscopic datasets on training, which are large and validated by a gold standard.
This work aims to evaluate the gap between dermoscopic and clinical samples and understand how the dataset variations impact training.
arXiv Detail & Related papers (2025-01-15T17:18:46Z) - Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline [42.49727243388804]
We propose an in-the-wild multimodal plant disease recognition dataset.
It contains the largest number of disease classes but also text-based descriptions for each disease.
Our proposed dataset can be regarded as an ideal testbed for evaluating disease recognition methods in the real world.
arXiv Detail & Related papers (2024-08-06T11:49:13Z) - How Does Pruning Impact Long-Tailed Multi-Label Medical Image
Classifiers? [49.35105290167996]
Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance.
This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification.
arXiv Detail & Related papers (2023-08-17T20:40:30Z) - AnoMalNet: Outlier Detection based Malaria Cell Image Classification
Method Leveraging Deep Autoencoder [0.0]
We propose an outlier detection based binary medical image classification technique which can handle even the most extreme case of class imbalance.
An autoencoder model titled AnoMalNet is trained with only the uninfected cell images at the beginning.
We have achieved an accuracy, precision, recall, and F1 score of 98.49%, 97.07%, 100%, and 98.52% respectively.
arXiv Detail & Related papers (2023-03-10T08:49:31Z) - Data Augmentation using Feature Generation for Volumetric Medical Images [0.08594140167290097]
Medical image classification is one of the most critical problems in the image recognition area.
One of the major challenges in this field is the scarcity of labelled training data.
Deep Learning models, in particular, show promising results on image segmentation and classification problems.
arXiv Detail & Related papers (2022-09-28T13:46:24Z) - SuperCon: Supervised Contrastive Learning for Imbalanced Skin Lesion
Classification [9.265557367859637]
SuperCon is a two-stage training strategy to overcome the class imbalance problem on skin lesion classification.
Our two-stage training strategy effectively addresses the class imbalance classification problem, and significantly improves existing works in terms of F1-score and AUC score.
arXiv Detail & Related papers (2022-02-11T15:19:36Z) - Cross-Site Severity Assessment of COVID-19 from CT Images via Domain
Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event.
To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites.
This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z) - Relational Subsets Knowledge Distillation for Long-tailed Retinal
Diseases Recognition [65.77962788209103]
We propose class subset learning by dividing the long-tailed data into multiple class subsets according to prior knowledge.
It enforces the model to focus on learning the subset-specific knowledge.
The proposed framework proved to be effective for the long-tailed retinal diseases recognition task.
arXiv Detail & Related papers (2021-04-22T13:39:33Z) - Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges.
We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories.
Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete
Labels [66.57101219176275]
Disease diagnosis on chest X-ray images is a challenging multi-label classification task.
We propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases.
Our method is the first to build a graph over the feature maps with a dynamic adjacency matrix for correlation learning.
arXiv Detail & Related papers (2020-02-26T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.