Related papers: Semantic Generative Augmentations for Few-Shot Counting

Semantic Generative Augmentations for Few-Shot Counting

URL: http://arxiv.org/abs/2311.16122v1
Date: Thu, 26 Oct 2023 11:42:48 GMT
Title: Semantic Generative Augmentations for Few-Shot Counting
Authors: Perla Doubinsky (CEDRIC - VERTIGO, CNAM), Nicolas Audebert (CEDRIC - VERTIGO, CNAM), Michel Crucianu (CEDRIC - VERTIGO), Herv\'e Le Borgne (CEA)
Abstract summary: We investigate how synthetic data can benefit few-shot class-agnostic counting. We propose to rely on a double conditioning of Stable Diffusion with both a prompt and a density map. Our experiments show that our diversified generation strategy significantly improves the counting accuracy of two recent and performing few-shot counting models.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the availability of powerful text-to-image diffusion models, recent works have explored the use of synthetic data to improve image classification performances. These works show that it can effectively augment or even replace real data. In this work, we investigate how synthetic data can benefit few-shot class-agnostic counting. This requires to generate images that correspond to a given input number of objects. However, text-to-image models struggle to grasp the notion of count. We propose to rely on a double conditioning of Stable Diffusion with both a prompt and a density map in order to augment a training dataset for few-shot counting. Due to the small dataset size, the fine-tuned model tends to generate images close to the training images. We propose to enhance the diversity of synthesized images by exchanging captions between images thus creating unseen configurations of object types and spatial layout. Our experiments show that our diversified generation strategy significantly improves the counting accuracy of two recent and performing few-shot counting models on FSC147 and CARPK.

Related papers

Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific Feedback [5.415802995586328]
Learning from feedback has been shown to enhance the alignment between text prompts and images in text-to-image diffusion models. We propose an efficient fine-turning method with specific reward objectives, including three stages. Experimental results on this benchmark show that our model outperforms other SOTA methods in both alignment and fidelity.
arXiv Detail & Related papers (2024-11-28T09:56:28Z)
Iterative Object Count Optimization for Text-to-image Diffusion Models [59.03672816121209]
Current models, which learn from image-text pairs, inherently struggle with counting. We propose optimizing the generated image based on a counting loss derived from a counting model that aggregates an object's potential. We evaluate the generation of various objects and show significant improvements in accuracy.
arXiv Detail & Related papers (2024-08-21T15:51:46Z)
DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution. DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model. We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z)
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images. Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z)
Diversified in-domain synthesis with efficient fine-tuning for few-shot classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class. We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data. We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z)
Explore the Power of Synthetic Data on Few-shot Object Detection [27.26215175101865]
Few-shot object detection (FSOD) aims to expand an object detector for novel categories given only a few instances for training. Recent text-to-image generation models have shown promising results in generating high-quality images. This work extensively studies how synthetic images generated from state-of-the-art text-to-image generators benefit FSOD tasks.
arXiv Detail & Related papers (2023-03-23T12:34:52Z)
Effective Data Augmentation With Diffusion Models [65.09758931804478]
We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models. Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples. We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
arXiv Detail & Related papers (2023-02-07T20:42:28Z)
Cap2Aug: Caption guided Image to Image data Augmentation [41.53127698828463]
Cap2Aug is an image-to-image diffusion model-based data augmentation strategy using image captions as text prompts. We generate captions from the limited training images and using these captions edit the training images using an image-to-image stable diffusion model. This strategy generates augmented versions of images similar to the training images yet provides semantic diversity across the samples.
arXiv Detail & Related papers (2022-12-11T04:37:43Z)
Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets. It considers a retrieval-then-optimization procedure to synthesize pseudo text features. It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z)
Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [72.60554897161948]
Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences. In this work, we repurpose such models to generate a descriptive text given an image at inference time. The resulting captions are much less restrictive than those obtained by supervised captioning methods.
arXiv Detail & Related papers (2021-11-29T11:01:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.