Related papers: InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

URL: http://arxiv.org/abs/2402.05937v3
Date: Mon, 8 Apr 2024 11:46:07 GMT
Title: InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Authors: Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma,
Abstract summary: We present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance. We integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated images. We conduct thorough experiments to show that, this enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer.
Score: 59.445498550159755
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated images. The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector. We conduct thorough experiments to show that, this enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer, to enhance object detectors by training on its generated samples, demonstrating superior performance over existing state-of-the-art methods in open-vocabulary (+4.5 AP) and data-sparse (+1.2 to 5.2 AP) scenarios. Project page with code: https://fcjian.github.io/InstaGen.

Related papers

DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data [61.62554324594797]
We propose DreamMask, which explores how to generate training data in the open-vocabulary setting, and how to train the model with both real and synthetic data. In general, DreamMask significantly simplifies the collection of large-scale training data, serving as a plug-and-play enhancement for existing methods. For instance, when trained on COCO and tested on ADE20K, the model equipped with DreamMask outperforms the previous state-of-the-art by a substantial margin of 2.1% mIoU.
arXiv Detail & Related papers (2025-01-03T19:00:00Z)
Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation [58.37525311718006]
We put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD) We propose CastDet, a CLIP-activated student-teacher detection framework that serves as the first OVAD detector specifically designed for the challenging aerial scenario. Our framework integrates a robust localization teacher along with several box selection strategies to generate high-quality proposals for novel objects.
arXiv Detail & Related papers (2024-11-04T12:59:13Z)
CycleHOI: Improving Human-Object Interaction Detection with Cycle Consistency of Detection and Generation [37.45945633515955]
We propose a new learning framework, coined as CycleHOI, to boost the performance of human-object interaction (HOI) detection. Our key design is to introduce a novel cycle consistency loss for the training of HOI detector. We perform extensive experiments to verify the effectiveness and generalization power of our CycleHOI.
arXiv Detail & Related papers (2024-07-16T06:55:43Z)
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets. We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability. Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z)
Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection [105.9932053078449]
In this work, we show that, on the contrary, the small and training-free filter is sufficient to capture more general artifact representations. Due to its unbias towards both the training and test sources, we define it as Data-Independent Operator (DIO) to achieve appealing improvements on unseen sources. Our detector achieves a remarkable improvement of $13.3%$, establishing a new state-of-the-art performance.
arXiv Detail & Related papers (2024-03-11T15:22:28Z)
DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection [41.436817746749384]
Diffusion Model is a scalable data engine for object detection. DiffusionEngine (DE) provides high-quality detection-oriented training pairs in a single stage.
arXiv Detail & Related papers (2023-09-07T17:55:01Z)
Object-Centric Slot Diffusion [30.722428924152382]
We introduce Latent Slot Diffusion (LSD), a novel model that serves dual purposes. We demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders. We also conduct a preliminary investigation into the integration of pre-trained diffusion models in LSD.
arXiv Detail & Related papers (2023-03-20T02:40:16Z)
DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery [20.787180028571694]
DiffusionSeg is a synthesis-exploitation framework containing two-stage strategies. We synthesize abundant images, and propose a novel training-free AttentionCut to obtain masks in the first stage. In the second exploitation stage, to bridge the structural gap, we use the inversion technique, to map the given image back to diffusion features.
arXiv Detail & Related papers (2023-03-17T07:47:55Z)
Robust and Accurate Object Detection via Adversarial Learning [111.36192453882195]
This work augments the fine-tuning stage for object detectors by exploring adversarial examples. Our approach boosts the performance of state-of-the-art EfficientDets by +1.1 mAP on the object detection benchmark.
arXiv Detail & Related papers (2021-03-23T19:45:26Z)
UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training. We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.