Related papers: Embryo 2.0: Merging Synthetic and Real Data for Advanced AI Predictions

Embryo 2.0: Merging Synthetic and Real Data for Advanced AI Predictions

URL: http://arxiv.org/abs/2412.01255v1
Date: Mon, 02 Dec 2024 08:24:49 GMT
Title: Embryo 2.0: Merging Synthetic and Real Data for Advanced AI Predictions
Authors: Oriana Presacan, Alexandru Dorobantiu, Vajira Thambawita, Michael A. Riegler, Mette H. Stensen, Mario Iliceto, Alexandru C. Aldea, Akriti Sharma,
Abstract summary: We train two generative models using two datasets, one created and made publicly available, and one existing public dataset.<n>We generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst.<n>These were combined with real images to train classification models for embryo cell stage prediction.
Score: 69.07284335967019
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate embryo morphology assessment is essential in assisted reproductive technology for selecting the most viable embryo. Artificial intelligence has the potential to enhance this process. However, the limited availability of embryo data presents challenges for training deep learning models. To address this, we trained two generative models using two datasets, one we created and made publicly available, and one existing public dataset, to generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst. These were combined with real images to train classification models for embryo cell stage prediction. Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 95% when trained solely on real data. Notably, even when trained exclusively on synthetic data and tested on real data, the model achieved a high accuracy of 94%. Furthermore, combining synthetic data from both generative models yielded better classification results than using data from a single generative model. Four embryologists evaluated the fidelity of the synthetic images through a Turing test, during which they annotated inaccuracies and offered feedback. The analysis showed the diffusion model outperformed the generative adversarial network model, deceiving embryologists 66.6% versus 25.3% and achieving lower Frechet inception distance scores.

Related papers

Improving Heart Rejection Detection in XPCI Images Using Synthetic Data Augmentation [0.0]
StyleGAN was trained on available 3R biopsy patches and subsequently used to generate 10,000 realistic synthetic images.<n>These were combined with real 0R samples, that is samples without rejection, in various configurations to train ResNet-18 classifiers for binary rejection classification.<n>Results demonstrate that synthetic data improves classification performance, particularly when used in combination with real samples.
arXiv Detail & Related papers (2025-05-26T09:26:36Z)
Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data? [8.775988650381397]
Training medical vision-language pre-training models requires datasets with paired, high-quality image-text data. Recent advancements in Large Language Models have made it possible to generate large-scale synthetic image-text pairs. We propose an automated pipeline to build a diverse, high-quality synthetic dataset.
arXiv Detail & Related papers (2024-10-17T13:11:07Z)
Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation [0.0]
This work proposes a memory-efficient patch-wise denoising diffusion probabilistic model (DDPM) for generating synthetic medical images. Our approach generates high-utility synthetic images with nodule segmentation while efficiently managing memory constraints. We evaluate the method in two scenarios: training a segmentation model exclusively on synthetic data, and augmenting real-world training data with synthetic images.
arXiv Detail & Related papers (2024-10-16T13:20:57Z)
Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks [5.0243930429558885]
This paper introduces Knowledge Recycling (KR), a pipeline designed to optimise the generation and use of synthetic data for training downstream classifiers. At the heart of this pipeline is Generative Knowledge Distillation (GKD), the proposed technique that significantly improves the quality and usefulness of the information. The results show a significant reduction in the performance gap between models trained on real and synthetic data, with models based on synthetic data outperforming those trained on real data in some cases.
arXiv Detail & Related papers (2024-07-22T10:31:07Z)
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images [11.70758559522617]
We provide the first benchmark for three classes of synthetic clone models, namely supervised, self-supervised, and multi-modal ones. We find that synthetic clones are much more susceptible to adversarial and real-world noise than models trained with real data.
arXiv Detail & Related papers (2024-05-30T20:37:34Z)
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability. We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z)
Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion Models for Enhanced Skin Disease Classification using ViT and CNN [1.0499611180329804]
We aim to incorporate enhanced data transformation techniques by extending the recent success of few-shot learning. We investigate the impact of incorporating newly generated synthetic data into the training pipeline of state-of-art machine learning models.
arXiv Detail & Related papers (2024-01-10T13:46:03Z)
On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z)
Exploring the Effectiveness of Dataset Synthesis: An application of Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection. We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset. Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z)
Differentially Private Diffusion Models Generate Useful Synthetic Images [53.94025967603649]
Recent studies have found that, by default, the outputs of some diffusion models do not preserve training data privacy. By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17. Our results demonstrate that diffusion models fine-tuned with differential privacy can produce useful and provably private synthetic data.
arXiv Detail & Related papers (2023-02-27T15:02:04Z)
Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
Synthetic Data for Model Selection [2.4499092754102874]
We show that synthetic data can be beneficial for model selection. We introduce a novel method to calibrate the synthetic error estimation to fit that of the real domain.
arXiv Detail & Related papers (2021-05-03T09:52:03Z)
Overcoming Barriers to Data Sharing with Medical Image Generation: A Comprehensive Evaluation [17.983449515155414]
We utilize Generative Adversarial Networks (GANs) to create derived medical imaging datasets consisting entirely of synthetic patient data. The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information. We measure the synthetic image quality by the performance difference of predictive models trained on either the synthetic or the real dataset.
arXiv Detail & Related papers (2020-11-29T15:41:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.