Natural vs Balanced Distribution in Deep Learning on Whole Slide Images
for Cancer Detection
- URL: http://arxiv.org/abs/2012.11684v1
- Date: Mon, 21 Dec 2020 21:18:49 GMT
- Title: Natural vs Balanced Distribution in Deep Learning on Whole Slide Images
for Cancer Detection
- Authors: Ismat Ara Reshma, Sylvain Cussat-Blanc, Radu Tudor Ionescu, Herv\'e
Luga, Josiane Mothe
- Abstract summary: We analyze the impact of natural and balanced distributions of the training set in deep learning (DL) models applied on histological images.
We show that keeping the WSIs data in their usual distribution for DL training produces fewer false positives (FPs) with comparable false negatives (FNs) than the artificially-obtained balanced distribution.
- Score: 14.34071893713379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The class distribution of data is one of the factors that regulates the
performance of machine learning models. However, investigations on the impact
of different distributions available in the literature are very few, sometimes
absent for domain-specific tasks. In this paper, we analyze the impact of
natural and balanced distributions of the training set in deep learning (DL)
models applied on histological images, also known as whole slide images (WSIs).
WSIs are considered as the gold standard for cancer diagnosis. In recent years,
researchers have turned their attention to DL models to automate and accelerate
the diagnosis process. In the training of such DL models, filtering out the
non-regions-of-interest from the WSIs and adopting an artificial distribution
(usually, a balanced distribution) is a common trend. In our analysis, we show
that keeping the WSIs data in their usual distribution (which we call natural
distribution) for DL training produces fewer false positives (FPs) with
comparable false negatives (FNs) than the artificially-obtained balanced
distribution. We conduct an empirical comparative study with 10 random folds
for each distribution, comparing the resulting average performance levels in
terms of five different evaluation metrics. Experimental results show the
effectiveness of the natural distribution over the balanced one across all the
evaluation metrics.
Related papers
- A Distributional Evaluation of Generative Image Models [2.520143908749992]
We focus on evaluating image generative models, where studies often treat human evaluation as the gold standard.
We propose the Embedded Characteristic Score (ECS), a comprehensive metric for evaluating the distributional match between the learned and target sample distributions.
arXiv Detail & Related papers (2025-01-01T06:23:18Z) - Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Model [22.39558434131574]
Existing data attribution methods for diffusion models typically quantify the contribution of a training sample.
We argue that the direct usage of diffusion loss cannot represent such a contribution accurately due to the calculation of diffusion loss.
We aim to measure the direct comparison between predicted distributions with an attribution score to analyse the training sample importance.
arXiv Detail & Related papers (2024-10-24T10:58:17Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Analyzing the Effects of Handling Data Imbalance on Learned Features
from Medical Images by Looking Into the Models [50.537859423741644]
Training a model on an imbalanced dataset can introduce unique challenges to the learning problem.
We look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features.
arXiv Detail & Related papers (2022-04-04T09:38:38Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - WILDS: A Benchmark of in-the-Wild Distribution Shifts [157.53410583509924]
Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild.
We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts.
We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
arXiv Detail & Related papers (2020-12-14T11:14:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.