The Big Data Myth: Using Diffusion Models for Dataset Generation to
Train Deep Detection Models
- URL: http://arxiv.org/abs/2306.09762v1
- Date: Fri, 16 Jun 2023 10:48:52 GMT
- Title: The Big Data Myth: Using Diffusion Models for Dataset Generation to
Train Deep Detection Models
- Authors: Roy Voetman and Maya Aghaei and Klaas Dijkstra
- Abstract summary: This study presents a framework for the generation of synthetic datasets by fine-tuning stable diffusion models.
The results of this study reveal that the object detection models trained on synthetic data perform similarly to the baseline model.
- Score: 0.15469452301122172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the notable accomplishments of deep object detection models, a major
challenge that persists is the requirement for extensive amounts of training
data. The process of procuring such real-world data is a laborious undertaking,
which has prompted researchers to explore new avenues of research, such as
synthetic data generation techniques. This study presents a framework for the
generation of synthetic datasets by fine-tuning pretrained stable diffusion
models. The synthetic datasets are then manually annotated and employed for
training various object detection models. These detectors are evaluated on a
real-world test set of 331 images and compared against a baseline model that
was trained on real-world images. The results of this study reveal that the
object detection models trained on synthetic data perform similarly to the
baseline model. In the context of apple detection in orchards, the average
precision deviation with the baseline ranges from 0.09 to 0.12. This study
illustrates the potential of synthetic data generation techniques as a viable
alternative to the collection of extensive training data for the training of
deep models.
Related papers
- Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology [0.14980193397844666]
We propose a methodology for improving the performance of a pre-trained object detector when training on synthetic data.
Our approach focuses on extracting the salient information from synthetic data without forgetting useful features learned from pre-training on real images.
arXiv Detail & Related papers (2024-05-30T08:31:01Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - Exploring the Effectiveness of Dataset Synthesis: An application of
Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection.
We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset.
Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Synthetic Data and Hierarchical Object Detection in Overhead Imagery [0.0]
We develop novel synthetic data generation and augmentation techniques for enhancing low/zero-sample learning in satellite imagery.
To test the effectiveness of synthetic imagery, we employ it in the training of detection models and our two stage model, and evaluate the resulting models on real satellite images.
arXiv Detail & Related papers (2021-01-29T22:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.