Object Detector Differences when using Synthetic and Real Training Data
- URL: http://arxiv.org/abs/2312.00694v1
- Date: Fri, 1 Dec 2023 16:27:48 GMT
- Title: Object Detector Differences when using Synthetic and Real Training Data
- Authors: Martin Georg Ljungqvist, Otto Nordander, Markus Skans, Arvid Mildner,
Tony Liu, Pierre Nugues
- Abstract summary: We train the YOLOv3 object detector on real and synthetic images from city environments.
We perform a similarity analysis using Centered Kernel Alignment (CKA) to explore the effects of training on synthetic data on a layer-wise basis.
The results show that the largest similarity between a detector trained on real data and a detector trained on synthetic data was in the early layers, and the largest difference was in the head part.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: To train well-performing generalizing neural networks, sufficiently large and
diverse datasets are needed. Collecting data while adhering to privacy
legislation becomes increasingly difficult and annotating these large datasets
is both a resource-heavy and time-consuming task. An approach to overcome these
difficulties is to use synthetic data since it is inherently scalable and can
be automatically annotated. However, how training on synthetic data affects the
layers of a neural network is still unclear. In this paper, we train the YOLOv3
object detector on real and synthetic images from city environments. We perform
a similarity analysis using Centered Kernel Alignment (CKA) to explore the
effects of training on synthetic data on a layer-wise basis. The analysis
captures the architecture of the detector while showing both different and
similar patterns between different models. With this similarity analysis we
want to give insights on how training synthetic data affects each layer and to
give a better understanding of the inner workings of complex neural networks.
The results show that the largest similarity between a detector trained on real
data and a detector trained on synthetic data was in the early layers, and the
largest difference was in the head part. The results also show that no major
difference in performance or similarity could be seen between frozen and
unfrozen backbone.
Related papers
- Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology [0.14980193397844666]
We propose a methodology for improving the performance of a pre-trained object detector when training on synthetic data.
Our approach focuses on extracting the salient information from synthetic data without forgetting useful features learned from pre-training on real images.
arXiv Detail & Related papers (2024-05-30T08:31:01Z) - Mind the Gap Between Synthetic and Real: Utilizing Transfer Learning to Probe the Boundaries of Stable Diffusion Generated Data [2.6016285265085526]
Student models show a significant drop in accuracy compared to models trained on real data.
By training these layers using either real or synthetic data, we reveal that the drop mainly stems from the model's final layers.
Our results suggest an improved trade-off between the amount of real training data used and the model's accuracy.
arXiv Detail & Related papers (2024-05-06T07:51:13Z) - Massively Annotated Datasets for Assessment of Synthetic and Real Data in Face Recognition [0.2775636978045794]
We study the drift between the performance of models trained on real and synthetic datasets.
We conduct studies on the differences between real and synthetic datasets on the attribute set.
Interestingly enough, we have verified that while real samples suffice to explain the synthetic distribution, the opposite could not be further from being true.
arXiv Detail & Related papers (2024-04-23T17:10:49Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - A New Benchmark: On the Utility of Synthetic Data with Blender for Bare
Supervised Learning and Downstream Domain Adaptation [42.2398858786125]
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data.
The uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist.
To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization.
arXiv Detail & Related papers (2023-03-16T09:03:52Z) - Personalized Decentralized Multi-Task Learning Over Dynamic
Communication Graphs [59.96266198512243]
We propose a decentralized and federated learning algorithm for tasks that are positively and negatively correlated.
Our algorithm uses gradients to calculate the correlations among tasks automatically, and dynamically adjusts the communication graph to connect mutually beneficial tasks and isolate those that may negatively impact each other.
We conduct experiments on a synthetic Gaussian dataset and a large-scale celebrity attributes (CelebA) dataset.
arXiv Detail & Related papers (2022-12-21T18:58:24Z) - Synthetic Data for Object Classification in Industrial Applications [53.180678723280145]
In object classification, capturing a large number of images per object and in different conditions is not always possible.
This work explores the creation of artificial images using a game engine to cope with limited data in the training dataset.
arXiv Detail & Related papers (2022-12-09T11:43:04Z) - Analysis of Training Object Detection Models with Synthetic Data [0.0]
This paper attempts to provide a holistic overview of how to use synthetic data for object detection.
We analyse aspects of generating the data as well as techniques used to train the models.
Experiments are validated on real data and benchmarked to models trained on real data.
arXiv Detail & Related papers (2022-11-29T10:21:16Z) - Graph Neural Networks with Trainable Adjacency Matrices for Fault
Diagnosis on Multivariate Sensor Data [69.25738064847175]
It is necessary to consider the behavior of the signals in each sensor separately, to take into account their correlation and hidden relationships with each other.
The graph nodes can be represented as data from the different sensors, and the edges can display the influence of these data on each other.
It was proposed to construct a graph during the training of graph neural network. This allows to train models on data where the dependencies between the sensors are not known in advance.
arXiv Detail & Related papers (2022-10-20T11:03:21Z) - BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.