AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics
- URL: http://arxiv.org/abs/2507.05063v1
- Date: Mon, 07 Jul 2025 14:49:05 GMT
- Title: AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics
- Authors: Jan Carreras Boada, Rao Muhammad Umer, Carsten Marr,
- Abstract summary: In this work, we focus on the classification of single white blood cells, a key component in the diagnosis of hematological diseases such as acute myeloid leukemia (AML)<n>We demonstrate how synthetic images generated with a fine-tuned stable diffusion model can enhance classifier performance for limited data.<n>Our results establish synthetic images as a tool in biomedical research, improving machine learning models, and facilitating medical diagnosis and research.
- Score: 3.462981934061808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biomedical datasets often contain a large sample imbalance and are subject to strict privacy constraints, which together hinder the development of accurate machine learning models. One potential solution is to generate synthetic images, as this can improve data availability while preserving patient privacy. However, it remains difficult to generate synthetic images of sufficient quality for training robust classifiers. In this work, we focus on the classification of single white blood cells, a key component in the diagnosis of hematological diseases such as acute myeloid leukemia (AML), a severe blood cancer. We demonstrate how synthetic images generated with a fine-tuned stable diffusion model using LoRA weights when guided by real few-shot samples of the target white blood cell classes, can enhance classifier performance for limited data. When training a ResNet classifier, accuracy increased from 27.3\% to 78.4\% (+51.1\%) by adding 5000 synthetic images per class to a small and highly imbalanced real dataset. For a CLIP-based classifier, the accuracy improved from 61.8\% to 76.8\% (+15.0\%). The synthetic images are highly similar to real images, and they can help overcome dataset limitations, enhancing model generalization. Our results establish synthetic images as a tool in biomedical research, improving machine learning models, and facilitating medical diagnosis and research.
Related papers
- SkinDualGen: Prompt-Driven Diffusion for Simultaneous Image-Mask Generation in Skin Lesions [0.0]
We propose a novel method that leverages the pretrained Stable Diffusion-2.0 model to generate high-quality synthetic skin lesion images.<n>A hybrid dataset combining real and synthetic data markedly enhances the performance of classification and segmentation models.
arXiv Detail & Related papers (2025-07-26T15:00:37Z) - Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback [43.1078084014722]
We propose a novel framework, coined MAGIC, that synthesizes clinically accurate skin disease images for data augmentation.<n>Our method creatively translates expert-defined criteria into actionable feedback for image synthesis of DMs.
arXiv Detail & Related papers (2025-06-14T03:15:09Z) - PixCell: A generative foundation model for digital histopathology images [49.00921097924924]
We introduce PixCell, the first diffusion-based generative foundation model for histopathology.<n>We train PixCell on PanCan-30M, a vast, diverse dataset derived from 69,184 H&E-stained whole slide images covering various cancer types.
arXiv Detail & Related papers (2025-06-05T15:14:32Z) - Improving Heart Rejection Detection in XPCI Images Using Synthetic Data Augmentation [0.0]
StyleGAN was trained on available 3R biopsy patches and subsequently used to generate 10,000 realistic synthetic images.<n>These were combined with real 0R samples, that is samples without rejection, in various configurations to train ResNet-18 classifiers for binary rejection classification.<n>Results demonstrate that synthetic data improves classification performance, particularly when used in combination with real samples.
arXiv Detail & Related papers (2025-05-26T09:26:36Z) - Diverse Image Generation with Diffusion Models and Cross Class Label Learning for Polyp Classification [4.747649393635696]
We develop a novel model, PathoPolyp-Diff, that generates text-controlled synthetic images with diverse characteristics.<n>We introduce cross-class label learning to make the model learn features from other classes, reducing the burdensome task of data annotation.
arXiv Detail & Related papers (2025-02-08T04:26:20Z) - Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.<n>We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - Merging synthetic and real embryo data for advanced AI predictions [69.07284335967019]
We train two generative models using two datasets-one we created and made publicly available, and one existing public dataset-to generate synthetic embryo images at various cell stages.<n>These were combined with real images to train classification models for embryo cell stage prediction.<n>Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 94.5% when trained solely on real data.
arXiv Detail & Related papers (2024-12-02T08:24:49Z) - Evaluating the plausibility of synthetic images for improving automated endoscopic stone recognition [0.9480662172227129]
Currently, the Morpho-Constitutional Analysis (MCA) is the de facto approach for the etiological diagnosis of kidney stone formation.
More recently, research has focused on performing such tasks intra-operatively, an approach known as Endoscopic Stone Recognition (ESR)
arXiv Detail & Related papers (2024-09-20T11:19:08Z) - Augmenting medical image classifiers with synthetic data from latent
diffusion models [12.077733447347592]
We show that latent diffusion models can scalably generate images of skin disease.
We generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies.
arXiv Detail & Related papers (2023-08-23T22:34:49Z) - Generative models improve fairness of medical classifiers under
distribution shifts [49.10233060774818]
We show that learning realistic augmentations automatically from data is possible in a label-efficient manner using generative models.
We demonstrate that these learned augmentations can surpass ones by making models more robust and statistically fair in- and out-of-distribution.
arXiv Detail & Related papers (2023-04-18T18:15:38Z) - Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders [50.689585476660554]
We propose a new fine-tuning strategy that includes positive-pair loss relaxation and random sentence sampling.
Our approach consistently improves overall zero-shot pathology classification across four chest X-ray datasets and three pre-trained models.
arXiv Detail & Related papers (2022-12-14T06:04:18Z) - Multi-label Thoracic Disease Image Classification with Cross-Attention
Networks [65.37531731899837]
We propose a novel scheme of Cross-Attention Networks (CAN) for automated thoracic disease classification from chest x-ray images.
We also design a new loss function that beyond cross-entropy loss to help cross-attention process and is able to overcome the imbalance between classes and easy-dominated samples within each class.
arXiv Detail & Related papers (2020-07-21T14:37:00Z) - An interpretable classifier for high-resolution breast cancer screening
images utilizing weakly supervised localization [45.00998416720726]
We propose a framework to address the unique properties of medical images.
This model first uses a low-capacity, yet memory-efficient, network on the whole image to identify the most informative regions.
It then applies another higher-capacity network to collect details from chosen regions.
Finally, it employs a fusion module that aggregates global and local information to make a final prediction.
arXiv Detail & Related papers (2020-02-13T15:28:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.