Explore the Power of Synthetic Data on Few-shot Object Detection
- URL: http://arxiv.org/abs/2303.13221v2
- Date: Fri, 12 May 2023 05:45:29 GMT
- Title: Explore the Power of Synthetic Data on Few-shot Object Detection
- Authors: Shaobo Lin, Kun Wang, Xingyu Zeng, Rui Zhao
- Abstract summary: Few-shot object detection (FSOD) aims to expand an object detector for novel categories given only a few instances for training.
Recent text-to-image generation models have shown promising results in generating high-quality images.
This work extensively studies how synthetic images generated from state-of-the-art text-to-image generators benefit FSOD tasks.
- Score: 27.26215175101865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot object detection (FSOD) aims to expand an object detector for novel
categories given only a few instances for training. The few training samples
restrict the performance of FSOD model. Recent text-to-image generation models
have shown promising results in generating high-quality images. How applicable
these synthetic images are for FSOD tasks remains under-explored. This work
extensively studies how synthetic images generated from state-of-the-art
text-to-image generators benefit FSOD tasks. We focus on two perspectives: (1)
How to use synthetic data for FSOD? (2) How to find representative samples from
the large-scale synthetic dataset? We design a copy-paste-based pipeline for
using synthetic data. Specifically, saliency object detection is applied to the
original generated image, and the minimum enclosing box is used for cropping
the main object based on the saliency map. After that, the cropped object is
randomly pasted on the image, which comes from the base dataset. We also study
the influence of the input text of text-to-image generator and the number of
synthetic images used. To construct a representative synthetic training
dataset, we maximize the diversity of the selected images via a sample-based
and cluster-based method. However, the severe problem of high false positives
(FP) ratio of novel categories in FSOD can not be solved by using synthetic
data. We propose integrating CLIP, a zero-shot recognition model, into the FSOD
pipeline, which can filter 90% of FP by defining a threshold for the similarity
score between the detected object and the text of the predicted category.
Extensive experiments on PASCAL VOC and MS COCO validate the effectiveness of
our method, in which performance gain is up to 21.9% compared to the few-shot
baseline.
Related papers
- The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better [39.57368843211441]
Every synthetic image ultimately originates from the upstream data used to train the generator.
We compare finetuning on task-relevant, targeted synthetic data generated by Stable Diffusion against finetuning on targeted real images retrieved directly from LAION-2B.
Our analysis suggests that this underperformance is partially due to generator artifacts and inaccurate task-relevant visual details in the synthetic images.
arXiv Detail & Related papers (2024-06-07T18:04:21Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - Improving the Effectiveness of Deep Generative Data [5.856292656853396]
Training a model on purely synthetic images for downstream image processing tasks results in an undesired performance drop compared to training on real data.
We propose a new taxonomy to describe factors contributing to this commonly observed phenomenon and investigate it on the popular CIFAR-10 dataset.
Our method outperforms baselines on downstream classification tasks both in case of training on synthetic only (Synthetic-to-Real) and training on a mix of real and synthetic data.
arXiv Detail & Related papers (2023-11-07T12:57:58Z) - Semantic Generative Augmentations for Few-Shot Counting [0.0]
We investigate how synthetic data can benefit few-shot class-agnostic counting.
We propose to rely on a double conditioning of Stable Diffusion with both a prompt and a density map.
Our experiments show that our diversified generation strategy significantly improves the counting accuracy of two recent and performing few-shot counting models.
arXiv Detail & Related papers (2023-10-26T11:42:48Z) - Randomize to Generalize: Domain Randomization for Runway FOD Detection [1.4249472316161877]
Tiny Object Detection is challenging due to small size, low resolution, occlusion, background clutter, lighting conditions and small object-to-image ratio.
We propose a novel two-stage methodology Synthetic Image Augmentation (SRIA) to enhance generalization capabilities of models encountering 2D datasets.
We report that detection accuracy improved from an initial 41% to 92% for OOD test set.
arXiv Detail & Related papers (2023-09-23T05:02:31Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Label-Free Synthetic Pretraining of Object Detectors [67.17371526567325]
We propose a new approach, Synthetic optimized layout with Instance Detection (SOLID), to pretrain object detectors with synthetic images.
Our "SOLID" approach consists of two main components: (1) generating synthetic images using a collection of unlabelled 3D models with optimized scene arrangement; (2) pretraining an object detector on "instance detection" task.
Our approach does not need any semantic labels for pretraining and allows the use of arbitrary, diverse 3D models.
arXiv Detail & Related papers (2022-08-08T16:55:17Z) - A Deep Learning Generative Model Approach for Image Synthesis of Plant
Leaves [62.997667081978825]
We generate via advanced Deep Learning (DL) techniques artificial leaf images in an automatized way.
We aim to dispose of a source of training samples for AI applications for modern crop management.
arXiv Detail & Related papers (2021-11-05T10:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.