DiffMix: Diffusion Model-based Data Synthesis for Nuclei Segmentation
and Classification in Imbalanced Pathology Image Datasets
- URL: http://arxiv.org/abs/2306.14132v1
- Date: Sun, 25 Jun 2023 05:31:08 GMT
- Title: DiffMix: Diffusion Model-based Data Synthesis for Nuclei Segmentation
and Classification in Imbalanced Pathology Image Datasets
- Authors: Hyun-Jic Oh and Won-Ki Jeong
- Abstract summary: We propose a realistic data synthesis method using a diffusion model.
We generate two types of virtual patches to enlarge the training data distribution.
We use a semantic-label-conditioned diffusion model to generate realistic and high-quality image samples.
- Score: 8.590026259176806
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Nuclei segmentation and classification is a significant process in pathology
image analysis. Deep learning-based approaches have greatly contributed to the
higher accuracy of this task. However, those approaches suffer from the
imbalanced nuclei data composition, which shows lower classification
performance on the rare nuclei class. In this paper, we propose a realistic
data synthesis method using a diffusion model. We generate two types of virtual
patches to enlarge the training data distribution, which is for balancing the
nuclei class variance and for enlarging the chance to look at various nuclei.
After that, we use a semantic-label-conditioned diffusion model to generate
realistic and high-quality image samples. We demonstrate the efficacy of our
method by experiment results on two imbalanced nuclei datasets, improving the
state-of-the-art networks. The experimental results suggest that the proposed
method improves the classification performance of the rare type nuclei
classification, while showing superior segmentation and classification
performance in imbalanced pathology nuclei datasets.
Related papers
- NucleiMix: Realistic Data Augmentation for Nuclei Instance Segmentation [2.6954348706500766]
NucleiMix is designed to balance the distribution of nuclei types by increasing the number of rare-type nuclei within datasets.
In the first phase, it identifies candidate locations similar to the surroundings of rare-type nuclei and inserts rare-type nuclei into the candidate locations.
In the second phase, it employs a progressive inpainting strategy using a pre-trained diffusion model to seamlessly integrate rare-type nuclei into their new environments.
arXiv Detail & Related papers (2024-10-22T04:03:36Z) - Anisotropic Diffusion Probabilistic Model for Imbalanced Image Classification [8.364943466191933]
We propose the Anisotropic Diffusion Probabilistic Model (ADPM) for imbalanced image classification problems.
We use the data distribution to control the diffusion speed of different class samples during the forward process, effectively improving the classification accuracy of the denoiser in the reverse process.
Our results confirm that the anisotropic diffusion model significantly improves the classification accuracy of rare classes while maintaining the accuracy of head classes.
arXiv Detail & Related papers (2024-09-22T04:42:52Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - Few-shot learning for COVID-19 Chest X-Ray Classification with
Imbalanced Data: An Inter vs. Intra Domain Study [49.5374512525016]
Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research.
Some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images.
We propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance.
arXiv Detail & Related papers (2024-01-18T16:59:27Z) - Diffusion-based Data Augmentation for Nuclei Image Segmentation [68.28350341833526]
We introduce the first diffusion-based augmentation method for nuclei segmentation.
The idea is to synthesize a large number of labeled images to facilitate training the segmentation model.
The experimental results show that by augmenting 10% labeled real dataset with synthetic samples, one can achieve comparable segmentation results.
arXiv Detail & Related papers (2023-10-22T06:16:16Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Diffusing Gaussian Mixtures for Generating Categorical Data [21.43283907118157]
We propose a generative model for categorical data based on diffusion models with a focus on high-quality sample generation.
Our method of evaluation highlights the capabilities and limitations of different generative models for generating categorical data.
arXiv Detail & Related papers (2023-03-08T14:55:32Z) - Cryo-shift: Reducing domain shift in cryo-electron subtomograms with
unsupervised domain adaptation and randomization [17.921052986098946]
Subtomogram classification and recognition constitute a primary step in the systematic recovery of macromolecular structures.
Supervised deep learning methods have been proven to be highly accurate and efficient for subtomogram classification.
We present Cryo-Shift, a fully unsupervised domain adaptation and randomization framework for deep learning-based cross-domain subtomogram classification.
arXiv Detail & Related papers (2021-11-17T13:43:36Z) - Cross-Site Severity Assessment of COVID-19 from CT Images via Domain
Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event.
To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites.
This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.