Analysis of Training Object Detection Models with Synthetic Data
- URL: http://arxiv.org/abs/2211.16066v1
- Date: Tue, 29 Nov 2022 10:21:16 GMT
- Title: Analysis of Training Object Detection Models with Synthetic Data
- Authors: Bram Vanherle, Steven Moonen, Frank Van Reeth, Nick Michiels
- Abstract summary: This paper attempts to provide a holistic overview of how to use synthetic data for object detection.
We analyse aspects of generating the data as well as techniques used to train the models.
Experiments are validated on real data and benchmarked to models trained on real data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, the use of synthetic training data has been on the rise as it
offers correctly labelled datasets at a lower cost. The downside of this
technique is that the so-called domain gap between the real target images and
synthetic training data leads to a decrease in performance. In this paper, we
attempt to provide a holistic overview of how to use synthetic data for object
detection. We analyse aspects of generating the data as well as techniques used
to train the models. We do so by devising a number of experiments, training
models on the Dataset of Industrial Metal Objects (DIMO). This dataset contains
both real and synthetic images. The synthetic part has different subsets that
are either exact synthetic copies of the real data or are copies with certain
aspects randomised. This allows us to analyse what types of variation are good
for synthetic training data and which aspects should be modelled to closely
match the target data. Furthermore, we investigate what types of training
techniques are beneficial towards generalisation to real data, and how to use
them. Additionally, we analyse how real images can be leveraged when training
on synthetic images. All these experiments are validated on real data and
benchmarked to models trained on real data. The results offer a number of
interesting takeaways that can serve as basic guidelines for using synthetic
data for object detection. Code to reproduce results is available at
https://github.com/EDM-Research/DIMO_ObjectDetection.
Related papers
- Exploring the Potential of Synthetic Data to Replace Real Data [16.89582896061033]
We find that the potential of synthetic data to replace real data varies depending on the number of cross-domain real images and the test set on which the trained model is evaluated.
We introduce two new metrics, the train2test distance and $textAP_textt2t$, to evaluate the ability of a cross-domain training set using synthetic data.
arXiv Detail & Related papers (2024-08-26T18:20:18Z) - Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology [0.14980193397844666]
We propose a methodology for improving the performance of a pre-trained object detector when training on synthetic data.
Our approach focuses on extracting the salient information from synthetic data without forgetting useful features learned from pre-training on real images.
arXiv Detail & Related papers (2024-05-30T08:31:01Z) - Object Detector Differences when using Synthetic and Real Training Data [0.0]
We train the YOLOv3 object detector on real and synthetic images from city environments.
We perform a similarity analysis using Centered Kernel Alignment (CKA) to explore the effects of training on synthetic data on a layer-wise basis.
The results show that the largest similarity between a detector trained on real data and a detector trained on synthetic data was in the early layers, and the largest difference was in the head part.
arXiv Detail & Related papers (2023-12-01T16:27:48Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Exploring the Effectiveness of Dataset Synthesis: An application of
Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection.
We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset.
Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z) - The Big Data Myth: Using Diffusion Models for Dataset Generation to
Train Deep Detection Models [0.15469452301122172]
This study presents a framework for the generation of synthetic datasets by fine-tuning stable diffusion models.
The results of this study reveal that the object detection models trained on synthetic data perform similarly to the baseline model.
arXiv Detail & Related papers (2023-06-16T10:48:52Z) - Synthetic Data for Object Classification in Industrial Applications [53.180678723280145]
In object classification, capturing a large number of images per object and in different conditions is not always possible.
This work explores the creation of artificial images using a game engine to cope with limited data in the training dataset.
arXiv Detail & Related papers (2022-12-09T11:43:04Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - Towards 3D Scene Understanding by Referring Synthetic Models [65.74211112607315]
Methods typically alleviate on-extensive annotations on real scene scans.
We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space.
Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
arXiv Detail & Related papers (2022-03-20T13:06:15Z) - Synthetic Data for Model Selection [2.4499092754102874]
We show that synthetic data can be beneficial for model selection.
We introduce a novel method to calibrate the synthetic error estimation to fit that of the real domain.
arXiv Detail & Related papers (2021-05-03T09:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.