Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion
Models for Enhanced Skin Disease Classification using ViT and CNN
- URL: http://arxiv.org/abs/2401.05159v1
- Date: Wed, 10 Jan 2024 13:46:03 GMT
- Title: Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion
Models for Enhanced Skin Disease Classification using ViT and CNN
- Authors: Muhammad Ali Farooq, Wang Yao, Michael Schukat, Mark A Little and
Peter Corcoran
- Abstract summary: We aim to incorporate enhanced data transformation techniques by extending the recent success of few-shot learning.
We investigate the impact of incorporating newly generated synthetic data into the training pipeline of state-of-art machine learning models.
- Score: 1.0499611180329804
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This study explores the utilization of Dermatoscopic synthetic data generated
through stable diffusion models as a strategy for enhancing the robustness of
machine learning model training. Synthetic data generation plays a pivotal role
in mitigating challenges associated with limited labeled datasets, thereby
facilitating more effective model training. In this context, we aim to
incorporate enhanced data transformation techniques by extending the recent
success of few-shot learning and a small amount of data representation in
text-to-image latent diffusion models. The optimally tuned model is further
used for rendering high-quality skin lesion synthetic data with diverse and
realistic characteristics, providing a valuable supplement and diversity to the
existing training data. We investigate the impact of incorporating newly
generated synthetic data into the training pipeline of state-of-art machine
learning models, assessing its effectiveness in enhancing model performance and
generalization to unseen real-world data. Our experimental results demonstrate
the efficacy of the synthetic data generated through stable diffusion models
helps in improving the robustness and adaptability of end-to-end CNN and vision
transformer models on two different real-world skin lesion datasets.
Related papers
- Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks [5.0243930429558885]
This paper introduces Knowledge Recycling (KR), a pipeline designed to optimise the generation and use of synthetic data for training downstream classifiers.
At the heart of this pipeline is Generative Knowledge Distillation (GKD), the proposed technique that significantly improves the quality and usefulness of the information.
The results show a significant reduction in the performance gap between models trained on real and synthetic data, with models based on synthetic data outperforming those trained on real data in some cases.
arXiv Detail & Related papers (2024-07-22T10:31:07Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Synthetic Data Generation with Large Language Models for Text
Classification: Potential and Limitations [21.583825474908334]
We study how the performance of models trained on synthetic data may vary with the subjectivity of classification.
Our results indicate that subjectivity, at both the task level and instance level, is negatively associated with the performance of the model trained on synthetic data.
arXiv Detail & Related papers (2023-10-11T19:51:13Z) - Does Synthetic Data Make Large Language Models More Efficient? [0.0]
This paper explores the nuances of synthetic data generation in NLP.
We highlight its advantages, including data augmentation potential and the introduction of structured variety.
We demonstrate the impact of template-based synthetic data on the performance of modern transformer models.
arXiv Detail & Related papers (2023-10-11T19:16:09Z) - How Good Are Synthetic Medical Images? An Empirical Study with Lung
Ultrasound [0.3312417881789094]
Adding synthetic training data using generative models offers a low-cost method to deal with the data scarcity challenge.
We show that training with both synthetic and real data outperforms training with real data alone.
arXiv Detail & Related papers (2023-10-05T15:42:53Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - Analyzing Effects of Fake Training Data on the Performance of Deep
Learning Systems [0.0]
Deep learning models frequently suffer from various problems such as class imbalance and lack of robustness to distribution shift.
With the advent of Generative Adversarial Networks (GANs) it is now possible to generate high-quality synthetic data.
We analyze the effect that various quantities of synthetic data, when mixed with original data, can have on a model's robustness to out-of-distribution data and the general quality of predictions.
arXiv Detail & Related papers (2023-03-02T13:53:22Z) - An Adversarial Active Sampling-based Data Augmentation Framework for
Manufacturable Chip Design [55.62660894625669]
Lithography modeling is a crucial problem in chip design to ensure a chip design mask is manufacturable.
Recent developments in machine learning have provided alternative solutions in replacing the time-consuming lithography simulations with deep neural networks.
We propose a litho-aware data augmentation framework to resolve the dilemma of limited data and improve the machine learning model performance.
arXiv Detail & Related papers (2022-10-27T20:53:39Z) - Style-Hallucinated Dual Consistency Learning for Domain Generalized
Semantic Segmentation [117.3856882511919]
We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework to handle domain shift.
Our SHADE yields significant improvement and outperforms state-of-the-art methods by 5.07% and 8.35% on the average mIoU of three real-world datasets.
arXiv Detail & Related papers (2022-04-06T02:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.