NCT-CRC-HE: Not All Histopathological Datasets Are Equally Useful
- URL: http://arxiv.org/abs/2409.11546v1
- Date: Tue, 17 Sep 2024 20:36:03 GMT
- Title: NCT-CRC-HE: Not All Histopathological Datasets Are Equally Useful
- Authors: Andrey Ignatov, Grigory Malivenko,
- Abstract summary: In this paper, we analyze a popular NCT-CRC-HE-100K colorectal cancer dataset used in numerous prior works.
We show that both this dataset and the obtained results may be affected by data-specific biases.
We show that even the simplest model using only 3 features per image can demonstrate over 50% accuracy on this 9-class dataset.
- Score: 15.10324445908774
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Numerous deep learning-based solutions have been proposed for histopathological image analysis over the past years. While they usually demonstrate exceptionally high accuracy, one key question is whether their precision might be affected by low-level image properties not related to histopathology but caused by microscopy image handling and pre-processing. In this paper, we analyze a popular NCT-CRC-HE-100K colorectal cancer dataset used in numerous prior works and show that both this dataset and the obtained results may be affected by data-specific biases. The most prominent revealed dataset issues are inappropriate color normalization, severe JPEG artifacts inconsistent between different classes, and completely corrupted tissue samples resulting from incorrect image dynamic range handling. We show that even the simplest model using only 3 features per image (red, green and blue color intensities) can demonstrate over 50% accuracy on this 9-class dataset, while using color histogram not explicitly capturing cell morphology features yields over 82% accuracy. Moreover, we show that a basic EfficientNet-B0 ImageNet pretrained model can achieve over 97.7% accuracy on this dataset, outperforming all previously proposed solutions developed for this task, including dedicated foundation histopathological models and large cell morphology-aware neural networks. The NCT-CRC-HE dataset is publicly available and can be freely used to replicate the presented results. The codes and pre-trained models used in this paper are available at https://github.com/gmalivenko/NCT-CRC-HE-experiments
Related papers
- Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks [11.749248917866915]
We propose a novel DeepCMorph model pre-trained to learn cell morphology and identify a large number of different cancer types.
We pretrained this module on the Pan-Cancer TCGA dataset consisting of over 270K tissue patches extracted from 8736 diagnostic slides from 7175 patients.
The proposed solution achieved a new state-of-the-art performance on the dataset under consideration, detecting 32 cancer types with over 82% accuracy and outperforming all previously proposed solutions by more than 4%.
arXiv Detail & Related papers (2024-07-11T16:03:59Z) - Performance of GAN-based augmentation for deep learning COVID-19 image
classification [57.1795052451257]
The biggest challenge in the application of deep learning to the medical domain is the availability of training data.
Data augmentation is a typical methodology used in machine learning when confronted with a limited data set.
In this work, a StyleGAN2-ADA model of Generative Adversarial Networks is trained on the limited COVID-19 chest X-ray image set.
arXiv Detail & Related papers (2023-04-18T15:39:58Z) - Enhanced Sharp-GAN For Histopathology Image Synthesis [63.845552349914186]
Histopathology image synthesis aims to address the data shortage issue in training deep learning approaches for accurate cancer detection.
We propose a novel approach that enhances the quality of synthetic images by using nuclei topology and contour regularization.
The proposed approach outperforms Sharp-GAN in all four image quality metrics on two datasets.
arXiv Detail & Related papers (2023-01-24T17:54:01Z) - DeepDC: Deep Distance Correlation as a Perceptual Image Quality
Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models.
We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features.
We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z) - Early Diagnosis of Retinal Blood Vessel Damage via Deep Learning-Powered
Collective Intelligence Models [0.3670422696827525]
The power of swarm algorithms is used to search for various combinations of convolutional, pooling, and normalization layers to provide the best model for the task.
The best TDCN model achieves an accuracy of 90.3%, AUC ROC of 0.956, and a Cohen score of 0.967.
arXiv Detail & Related papers (2022-10-17T21:38:38Z) - H&E-adversarial network: a convolutional neural network to learn
stain-invariant features through Hematoxylin & Eosin regression [1.7371375427784381]
This paper presents a novel method to train convolutional neural networks (CNNs) that better generalize on data including several colour variations.
The method, called H&E-adversarial CNN, exploits H&E matrix information to learn stain-invariant features during the training.
arXiv Detail & Related papers (2022-01-17T10:34:23Z) - Stain Normalized Breast Histopathology Image Recognition using
Convolutional Neural Networks for Cancer Detection [9.826027427965354]
Recent advances have shown that the convolutional Neural Network (CNN) architectures can be used to design a Computer Aided Diagnostic (CAD) System for breast cancer detection.
We consider some contemporary CNN models for binary classification of breast histopathology images.
We have validated the trained CNN networks on a publicly available BreaKHis dataset, for 200x and 400x magnified histopathology images.
arXiv Detail & Related papers (2022-01-04T03:09:40Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z) - Examining and Mitigating Kernel Saturation in Convolutional Neural
Networks using Negative Images [0.8594140167290097]
We analyze the effect of convolutional kernel saturation in CNNs.
We propose a simple data augmentation technique to mitigate saturation and increase classification accuracy, by supplementing negative images to the training dataset.
Our results show that CNNs are indeed susceptible to convolutional kernel saturation and that supplementing negative images to the training dataset can offer a statistically significant increase in classification accuracies.
arXiv Detail & Related papers (2021-05-10T06:06:49Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z) - Data Consistent CT Reconstruction from Insufficient Data with Learned
Prior Images [70.13735569016752]
We investigate the robustness of deep learning in CT image reconstruction by showing false negative and false positive lesion cases.
We propose a data consistent reconstruction (DCR) method to improve their image quality, which combines the advantages of compressed sensing and deep learning.
The efficacy of the proposed method is demonstrated in cone-beam CT with truncated data, limited-angle data and sparse-view data, respectively.
arXiv Detail & Related papers (2020-05-20T13:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.