ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models
- URL: http://arxiv.org/abs/2405.15199v2
- Date: Tue, 05 Nov 2024 16:40:01 GMT
- Title: ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models
- Authors: Jingyuan Zhu, Shiyu Li, Yuxuan Liu, Ping Huang, Jiulong Shan, Huimin Ma, Jian Yuan,
- Abstract summary: This paper presents ODGEN, a novel method to generate high-quality images conditioned on bounding boxes.
We first fine-tune a pre-trained diffusion model on both cropped foreground objects and entire images to fit target distributions.
We then propose to control the diffusion model using synthesized visual robustness prompts with spatial constraints and object-wise textual descriptions.
- Score: 21.158266387658905
- License:
- Abstract: Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the generation quality and the controllability for complex scenes containing multi-class objects and dense objects with occlusions remain limited. This paper presents ODGEN, a novel method to generate high-quality images conditioned on bounding boxes, thereby facilitating data synthesis for object detection. Given a domain-specific object detection dataset, we first fine-tune a pre-trained diffusion model on both cropped foreground objects and entire images to fit target distributions. Then we propose to control the diffusion model using synthesized visual prompts with spatial constraints and object-wise textual descriptions. ODGEN exhibits robustness in handling complex scenes and specific domains. Further, we design a dataset synthesis pipeline to evaluate ODGEN on 7 domain-specific benchmarks to demonstrate its effectiveness. Adding training data generated by ODGEN improves up to 25.3% mAP@.50:.95 with object detectors like YOLOv5 and YOLOv7, outperforming prior controllable generative methods. In addition, we design an evaluation protocol based on COCO-2014 to validate ODGEN in general domains and observe an advantage up to 5.6% in mAP@.50:.95 against existing methods.
Related papers
- Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation [0.0]
Text-to-image (T2I) generative diffusion models have demonstrated outstanding performance in synthesizing diverse, high-quality visuals from text captions.
We propose ObjectDiffusion, a model that conditions T2I diffusion models on semantic and spatial grounding information.
arXiv Detail & Related papers (2025-01-15T22:55:26Z) - A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - DODA: Diffusion for Object-detection Domain Adaptation in Agriculture [4.549305421261851]
We propose DODA, a data synthesizer that can generate high-quality object detection data for new domains in agriculture.
Specifically, we improve the controllability of layout-to-image through encoding layout as an image, thereby improving the quality of labels.
arXiv Detail & Related papers (2024-03-27T08:16:33Z) - InstaGen: Enhancing Object Detection by Training on Synthetic Dataset [59.445498550159755]
We present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance.
We integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated images.
We conduct thorough experiments to show that, this enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer.
arXiv Detail & Related papers (2024-02-08T18:59:53Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Randomize to Generalize: Domain Randomization for Runway FOD Detection [1.4249472316161877]
Tiny Object Detection is challenging due to small size, low resolution, occlusion, background clutter, lighting conditions and small object-to-image ratio.
We propose a novel two-stage methodology Synthetic Image Augmentation (SRIA) to enhance generalization capabilities of models encountering 2D datasets.
We report that detection accuracy improved from an initial 41% to 92% for OOD test set.
arXiv Detail & Related papers (2023-09-23T05:02:31Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - SF-FSDA: Source-Free Few-Shot Domain Adaptive Object Detection with
Efficient Labeled Data Factory [94.11898696478683]
Domain adaptive object detection aims to leverage the knowledge learned from a labeled source domain to improve the performance on an unlabeled target domain.
We propose and investigate a more practical and challenging domain adaptive object detection problem under both source-free and few-shot conditions, named as SF-FSDA.
arXiv Detail & Related papers (2023-06-07T12:34:55Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.