Synthetic Data for Object Classification in Industrial Applications
        - URL: http://arxiv.org/abs/2212.04790v1
- Date: Fri, 9 Dec 2022 11:43:04 GMT
- Title: Synthetic Data for Object Classification in Industrial Applications
- Authors: August Baaz, Yonan Yonan, Kevin Hernandez-Diaz, Fernando
  Alonso-Fernandez, Felix Nilsson
- Abstract summary: In object classification, capturing a large number of images per object and in different conditions is not always possible.
This work explores the creation of artificial images using a game engine to cope with limited data in the training dataset.
- Score: 53.180678723280145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   One of the biggest challenges in machine learning is data collection.
Training data is an important part since it determines how the model will
behave. In object classification, capturing a large number of images per object
and in different conditions is not always possible and can be very
time-consuming and tedious. Accordingly, this work explores the creation of
artificial images using a game engine to cope with limited data in the training
dataset. We combine real and synthetic data to train the object classification
engine, a strategy that has shown to be beneficial to increase confidence in
the decisions made by the classifier, which is often critical in industrial
setups. To combine real and synthetic data, we first train the classifier on a
massive amount of synthetic data, and then we fine-tune it on real images.
Another important result is that the amount of real images needed for
fine-tuning is not very high, reaching top accuracy with just 12 or 24 images
per class. This substantially reduces the requirements of capturing a great
amount of real data.
 
      
        Related papers
        - Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of   Real Vehicles [81.29018359825872]
 This paper consolidates a set of good practices to finetune large pretrained models for a real-world task.
Specifically, we develop several strategies to account for discrepancies between the synthetic data and real driving data.
Our insights lead to effective finetuning that results in a $68.8%$ reduction in FID for novel view synthesis over prior arts.
 arXiv  Detail & Related papers  (2024-12-19T03:39:13Z)
- Data-Efficient Generation for Dataset Distillation [12.106527496044473]
 We train a conditional latent diffusion model capable of generating realistic synthetic images with labels.
We demonstrate that models can be effectively trained using only a small set of synthetic images and evaluated on a large real test set.
 arXiv  Detail & Related papers  (2024-09-05T22:31:53Z)
- DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
 We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
 arXiv  Detail & Related papers  (2024-07-15T17:10:31Z)
- Is Synthetic Image Useful for Transfer Learning? An Investigation into   Data Generation, Volume, and Utilization [62.157627519792946]
 We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
 arXiv  Detail & Related papers  (2024-03-28T22:25:05Z)
- Scaling Laws of Synthetic Images for Model Training ... for Now [54.43596959598466]
 We study the scaling laws of synthetic images generated by state of the art text-to-image models.
We observe that synthetic images demonstrate a scaling trend similar to, but slightly less effective than, real images in CLIP training.
 arXiv  Detail & Related papers  (2023-12-07T18:59:59Z)
- Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
 We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts.
We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data.
Our method significantly improves the performance of models trained on synthetic training data.
 arXiv  Detail & Related papers  (2023-07-17T14:38:11Z)
- Synthetic Image Data for Deep Learning [0.294944680995069]
 Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models.
We show how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle.
 arXiv  Detail & Related papers  (2022-12-12T20:28:13Z)
- Analysis of Training Object Detection Models with Synthetic Data [0.0]
 This paper attempts to provide a holistic overview of how to use synthetic data for object detection.
We analyse aspects of generating the data as well as techniques used to train the models.
Experiments are validated on real data and benchmarked to models trained on real data.
 arXiv  Detail & Related papers  (2022-11-29T10:21:16Z)
- Is synthetic data from generative models ready for image recognition? [69.42645602062024]
 We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
 arXiv  Detail & Related papers  (2022-10-14T06:54:24Z)
- PennSyn2Real: Training Object Recognition Models without Human Labeling [12.923677573437699]
 We propose PennSyn2Real - a synthetic dataset consisting of more than 100,000 4K images of more than 20 types of micro aerial vehicles (MAVs)
The dataset can be used to generate arbitrary numbers of training images for high-level computer vision tasks such as MAV detection and classification.
We show that synthetic data generated using this framework can be directly used to train CNN models for common object recognition tasks such as detection and segmentation.
 arXiv  Detail & Related papers  (2020-09-22T02:53:40Z)
- Can Synthetic Data Improve Object Detection Results for Remote Sensing
  Images? [15.466412729455874]
 We propose the use of realistic synthetic data with a wide distribution to improve the performance of remote sensing image aircraft detection.
We randomly set the parameters during rendering, such as the size of the instance and the class of background images.
In order to make the synthetic images more realistic, we refine the synthetic images at the pixel level using CycleGAN with real unlabeled images.
 arXiv  Detail & Related papers  (2020-06-09T02:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.