Embryo 2.0: Merging Synthetic and Real Data for Advanced AI Predictions
- URL: http://arxiv.org/abs/2412.01255v1
- Date: Mon, 02 Dec 2024 08:24:49 GMT
- Title: Embryo 2.0: Merging Synthetic and Real Data for Advanced AI Predictions
- Authors: Oriana Presacan, Alexandru Dorobantiu, Vajira Thambawita, Michael A. Riegler, Mette H. Stensen, Mario Iliceto, Alexandru C. Aldea, Akriti Sharma,
- Abstract summary: We train two generative models using two datasets, one created and made publicly available, and one existing public dataset.
We generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst.
These were combined with real images to train classification models for embryo cell stage prediction.
- Score: 69.07284335967019
- License:
- Abstract: Accurate embryo morphology assessment is essential in assisted reproductive technology for selecting the most viable embryo. Artificial intelligence has the potential to enhance this process. However, the limited availability of embryo data presents challenges for training deep learning models. To address this, we trained two generative models using two datasets, one we created and made publicly available, and one existing public dataset, to generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst. These were combined with real images to train classification models for embryo cell stage prediction. Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 95% when trained solely on real data. Notably, even when trained exclusively on synthetic data and tested on real data, the model achieved a high accuracy of 94%. Furthermore, combining synthetic data from both generative models yielded better classification results than using data from a single generative model. Four embryologists evaluated the fidelity of the synthetic images through a Turing test, during which they annotated inaccuracies and offered feedback. The analysis showed the diffusion model outperformed the generative adversarial network model, deceiving embryologists 66.6% versus 25.3% and achieving lower Frechet inception distance scores.
Related papers
- Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data? [8.775988650381397]
Training medical vision-language pre-training models requires datasets with paired, high-quality image-text data.
Recent advancements in Large Language Models have made it possible to generate large-scale synthetic image-text pairs.
We propose an automated pipeline to build a diverse, high-quality synthetic dataset.
arXiv Detail & Related papers (2024-10-17T13:11:07Z) - Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation [0.0]
This work proposes a memory-efficient patch-wise denoising diffusion probabilistic model (DDPM) for generating synthetic medical images.
Our approach generates high-utility synthetic images with nodule segmentation while efficiently managing memory constraints.
We evaluate the method in two scenarios: training a segmentation model exclusively on synthetic data, and augmenting real-world training data with synthetic images.
arXiv Detail & Related papers (2024-10-16T13:20:57Z) - Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images [11.70758559522617]
We provide the first benchmark for three classes of synthetic clone models, namely supervised, self-supervised, and multi-modal ones.
We find that synthetic clones are much more susceptible to adversarial and real-world noise than models trained with real data.
arXiv Detail & Related papers (2024-05-30T20:37:34Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion
Models for Enhanced Skin Disease Classification using ViT and CNN [1.0499611180329804]
We aim to incorporate enhanced data transformation techniques by extending the recent success of few-shot learning.
We investigate the impact of incorporating newly generated synthetic data into the training pipeline of state-of-art machine learning models.
arXiv Detail & Related papers (2024-01-10T13:46:03Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - Exploring the Effectiveness of Dataset Synthesis: An application of
Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection.
We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset.
Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z) - Differentially Private Diffusion Models Generate Useful Synthetic Images [53.94025967603649]
Recent studies have found that, by default, the outputs of some diffusion models do not preserve training data privacy.
By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17.
Our results demonstrate that diffusion models fine-tuned with differential privacy can produce useful and provably private synthetic data.
arXiv Detail & Related papers (2023-02-27T15:02:04Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Synthetic Data for Model Selection [2.4499092754102874]
We show that synthetic data can be beneficial for model selection.
We introduce a novel method to calibrate the synthetic error estimation to fit that of the real domain.
arXiv Detail & Related papers (2021-05-03T09:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.