Assessment of Using Synthetic Data in Brain Tumor Segmentation
- URL: http://arxiv.org/abs/2508.11922v2
- Date: Tue, 19 Aug 2025 16:35:17 GMT
- Title: Assessment of Using Synthetic Data in Brain Tumor Segmentation
- Authors: Aditi Jahagirdar, Sameer Joshi,
- Abstract summary: This study investigates, as a proof of concept, the impact of incorporating synthetic MRI data, generated using a pre-trained GAN model, into training a U-Net segmentation network.<n>Experiments were conducted using real data from the BraTS 2020 dataset, synthetic data generated with the medigan library, and hybrid datasets combining real and synthetic samples in varying proportions.
- Score: 0.3222802562733786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Manual brain tumor segmentation from MRI scans is challenging due to tumor heterogeneity, scarcity of annotated data, and class imbalance in medical imaging datasets. Synthetic data generated by generative models has the potential to mitigate these issues by improving dataset diversity. This study investigates, as a proof of concept, the impact of incorporating synthetic MRI data, generated using a pre-trained GAN model, into training a U-Net segmentation network. Experiments were conducted using real data from the BraTS 2020 dataset, synthetic data generated with the medigan library, and hybrid datasets combining real and synthetic samples in varying proportions. While overall quantitative performance (Dice coefficient, IoU, precision, recall, accuracy) was comparable between real-only and hybrid-trained models, qualitative inspection suggested that hybrid datasets, particularly with 40% real and 60% synthetic data, improved whole tumor boundary delineation. However, region-wise accuracy for the tumor core and the enhancing tumor remained lower, indicating a persistent class imbalance. The findings support the feasibility of synthetic data as an augmentation strategy for brain tumor segmentation, while highlighting the need for larger-scale experiments, volumetric data consistency, and mitigating class imbalance in future work.
Related papers
- From Healthy Scans to Annotated Tumors: A Tumor Fabrication Framework for 3D Brain MRI Synthesis [3.295857224165814]
Tumor Fabrication (TF) is a novel two-stage framework for unpaired 3D brain tumor synthesis.<n>TF is fully automated and leverages only healthy image scans along with a limited amount of real annotated data.<n>We demonstrate that our synthetic image-label pairs used as data enrichment can significantly improve performance on downstream tumor segmentation tasks in low-data regimes.
arXiv Detail & Related papers (2025-11-23T23:28:49Z) - Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference [89.5628648718851]
Causal inference is essential for developing and evaluating medical interventions.<n>Real-world medical datasets are often difficult to access due to regulatory barriers.<n>We present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine.
arXiv Detail & Related papers (2025-10-21T16:16:00Z) - Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models [1.0456203870202954]
This work introduces a novel framework for brain tumor segmentation leveraging pre-trained GANs and Unet architectures.<n>By combining a global anomaly detection module with a refined mask generation network, the proposed model accurately identifies tumor-sensitive regions.<n>Multi-modal MRI data and synthetic image augmentation are employed to improve robustness and address the challenge of limited annotated datasets.
arXiv Detail & Related papers (2025-06-26T13:28:09Z) - Enhancing Privacy: The Utility of Stand-Alone Synthetic CT and MRI for Tumor and Bone Segmentation [2.4345008922715756]
We employ head and neck cancer CT scans and brain glioma MRI scans from two large datasets.<n>Synthetic data were generated using generative adversarial networks and diffusion models.<n>We evaluate the quality of the synthetic data using MAE, MS-SSIM, Radiomics and a Visual Turing Test (VTT) performed by 5 radiologists.
arXiv Detail & Related papers (2025-06-13T08:17:48Z) - Synthetic Poisoning Attacks: The Impact of Poisoned MRI Image on U-Net Brain Tumor Segmentation [8.955776982854985]
We investigate the impact of synthetic MRI data on the robustness and segmentation accuracy of U-Net models for brain tumor segmentation.<n>To quantify the effect of synthetic data contamination, we train U-Net models on progressively "poisoned" datasets.
arXiv Detail & Related papers (2025-02-06T07:21:19Z) - ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models [46.80682547774335]
We propose ScaleMAI, an agent of AI-integrated data curation and annotation.<n>First, ScaleMAI creates a dataset of 25,362 CT scans, including per-voxel annotations for benign/malignant tumors and 24 anatomical structures.<n>Second, through progressive human-in-the-loop iterations, ScaleMAI provides Flagship AI Model that can approach the proficiency of expert annotators in detecting pancreatic tumors.
arXiv Detail & Related papers (2025-01-06T22:12:00Z) - Merging synthetic and real embryo data for advanced AI predictions [69.07284335967019]
We train two generative models using two datasets-one we created and made publicly available, and one existing public dataset-to generate synthetic embryo images at various cell stages.<n>These were combined with real images to train classification models for embryo cell stage prediction.<n>Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 94.5% when trained solely on real data.
arXiv Detail & Related papers (2024-12-02T08:24:49Z) - A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds [49.34500499203579]
We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics.
We generate high-quality synthetic fMRI data based on user-supplied demographics.
arXiv Detail & Related papers (2024-05-13T17:49:20Z) - Synthetically Enhanced: Unveiling Synthetic Data's Potential in Medical Imaging Research [4.475998415951477]
Generative AI offers a promising approach to generating synthetic images, enhancing dataset diversity.
This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research.
arXiv Detail & Related papers (2023-11-15T21:58:01Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.