Semantic Generative Augmentations for Few-Shot Counting
- URL: http://arxiv.org/abs/2311.16122v1
- Date: Thu, 26 Oct 2023 11:42:48 GMT
- Title: Semantic Generative Augmentations for Few-Shot Counting
- Authors: Perla Doubinsky (CEDRIC - VERTIGO, CNAM), Nicolas Audebert (CEDRIC -
VERTIGO, CNAM), Michel Crucianu (CEDRIC - VERTIGO), Herv\'e Le Borgne (CEA)
- Abstract summary: We investigate how synthetic data can benefit few-shot class-agnostic counting.
We propose to rely on a double conditioning of Stable Diffusion with both a prompt and a density map.
Our experiments show that our diversified generation strategy significantly improves the counting accuracy of two recent and performing few-shot counting models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the availability of powerful text-to-image diffusion models, recent
works have explored the use of synthetic data to improve image classification
performances. These works show that it can effectively augment or even replace
real data. In this work, we investigate how synthetic data can benefit few-shot
class-agnostic counting. This requires to generate images that correspond to a
given input number of objects. However, text-to-image models struggle to grasp
the notion of count. We propose to rely on a double conditioning of Stable
Diffusion with both a prompt and a density map in order to augment a training
dataset for few-shot counting. Due to the small dataset size, the fine-tuned
model tends to generate images close to the training images. We propose to
enhance the diversity of synthesized images by exchanging captions between
images thus creating unseen configurations of object types and spatial layout.
Our experiments show that our diversified generation strategy significantly
improves the counting accuracy of two recent and performing few-shot counting
models on FSC147 and CARPK.
Related papers
- Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific Feedback [5.415802995586328]
Learning from feedback has been shown to enhance the alignment between text prompts and images in text-to-image diffusion models.
We propose an efficient fine-turning method with specific reward objectives, including three stages.
Experimental results on this benchmark show that our model outperforms other SOTA methods in both alignment and fidelity.
arXiv Detail & Related papers (2024-11-28T09:56:28Z) - Iterative Object Count Optimization for Text-to-image Diffusion Models [59.03672816121209]
Current models, which learn from image-text pairs, inherently struggle with counting.
We propose optimizing the generated image based on a counting loss derived from a counting model that aggregates an object's potential.
We evaluate the generation of various objects and show significant improvements in accuracy.
arXiv Detail & Related papers (2024-08-21T15:51:46Z) - DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - Counting Guidance for High Fidelity Text-to-Image Synthesis [16.76098645308941]
Text-to-image diffusion models sometimes struggle to create high-fidelity content for the given input prompt.
We present a method to improve diffusion models so that they accurately produce the correct object count based on the input prompt.
arXiv Detail & Related papers (2023-06-30T11:40:35Z) - Effective Data Augmentation With Diffusion Models [65.09758931804478]
We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models.
Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples.
We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
arXiv Detail & Related papers (2023-02-07T20:42:28Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [72.60554897161948]
Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences.
In this work, we repurpose such models to generate a descriptive text given an image at inference time.
The resulting captions are much less restrictive than those obtained by supervised captioning methods.
arXiv Detail & Related papers (2021-11-29T11:01:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.