Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification
- URL: http://arxiv.org/abs/2312.03046v2
- Date: Thu, 7 Dec 2023 02:04:49 GMT
- Title: Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification
- Authors: Victor G. Turrisi da Costa, Nicola Dall'Asen, Yiming Wang, Nicu Sebe,
Elisa Ricci
- Abstract summary: Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
- Score: 64.86872227580866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot image classification aims to learn an image classifier using only a
small set of labeled examples per class. A recent research direction for
improving few-shot classifiers involves augmenting the labelled samples with
synthetic images created by state-of-the-art text-to-image generation models.
Following this trend, we propose Diversified In-domain Synthesis with Efficient
Fine-tuning (DISEF), a novel approach which addresses the generalization
challenge in few-shot learning using synthetic data. DISEF consists of two main
components. First, we propose a novel text-to-image augmentation pipeline that,
by leveraging the real samples and their rich semantics coming from an advanced
captioning model, promotes in-domain sample diversity for better
generalization. Second, we emphasize the importance of effective model
fine-tuning in few-shot recognition, proposing to use Low-Rank Adaptation
(LoRA) for joint adaptation of the text and image encoders in a Vision Language
Model. We validate our method in ten different benchmarks, consistently
outperforming baselines and establishing a new state-of-the-art for few-shot
classification. Code is available at https://github.com/vturrisi/disef.
Related papers
- Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with
Synthetic Images [37.29348016920314]
We present a new framework leveraging off-the-shelf generative models to generate synthetic training images.
We address class name ambiguity, lack of diversity in naive prompts, and domain shifts.
Our framework consistently enhances recognition model performance with more synthetic data.
arXiv Detail & Related papers (2023-12-04T18:35:27Z) - Semantic Generative Augmentations for Few-Shot Counting [0.0]
We investigate how synthetic data can benefit few-shot class-agnostic counting.
We propose to rely on a double conditioning of Stable Diffusion with both a prompt and a density map.
Our experiments show that our diversified generation strategy significantly improves the counting accuracy of two recent and performing few-shot counting models.
arXiv Detail & Related papers (2023-10-26T11:42:48Z) - Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations [61.132408427908175]
zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain.
With only a single representative text feature instead of real images, the synthesized images gradually lose diversity.
We propose a novel method to find semantic variations of the target text in the CLIP space.
arXiv Detail & Related papers (2023-08-21T08:12:28Z) - Learning Disentangled Prompts for Compositional Image Synthesis [27.99470176603746]
We study the problem of teaching pretrained image generative models a new style or concept from as few as one image to synthesize novel images.
We propose a novel source class distilled visual prompt that learns disentangled prompts of semantic (e.g., class) and domain (e.g., style) from a few images.
arXiv Detail & Related papers (2023-06-01T14:56:37Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - GLIDE: Towards Photorealistic Image Generation and Editing with
Text-Guided Diffusion Models [16.786221846896108]
We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies.
We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples.
Our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing.
arXiv Detail & Related papers (2021-12-20T18:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.