Learning Disentangled Prompts for Compositional Image Synthesis
- URL: http://arxiv.org/abs/2306.00763v1
- Date: Thu, 1 Jun 2023 14:56:37 GMT
- Title: Learning Disentangled Prompts for Compositional Image Synthesis
- Authors: Kihyuk Sohn, Albert Shaw, Yuan Hao, Han Zhang, Luisa Polania, Huiwen
Chang, Lu Jiang, Irfan Essa
- Abstract summary: We study the problem of teaching pretrained image generative models a new style or concept from as few as one image to synthesize novel images.
We propose a novel source class distilled visual prompt that learns disentangled prompts of semantic (e.g., class) and domain (e.g., style) from a few images.
- Score: 27.99470176603746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study domain-adaptive image synthesis, the problem of teaching pretrained
image generative models a new style or concept from as few as one image to
synthesize novel images, to better understand the compositional image
synthesis. We present a framework that leverages a pretrained class-conditional
generation model and visual prompt tuning. Specifically, we propose a novel
source class distilled visual prompt that learns disentangled prompts of
semantic (e.g., class) and domain (e.g., style) from a few images. Learned
domain prompt is then used to synthesize images of any classes in the style of
target domain. We conduct studies on various target domains with the number of
images ranging from one to a few to many, and show qualitative results which
show the compositional generalization of our method. Moreover, we show that our
method can help improve zero-shot domain adaptation classification accuracy.
Related papers
- FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior [50.0535198082903]
We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image.
We showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish generic image composition.
arXiv Detail & Related papers (2024-07-06T03:35:43Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with
Synthetic Images [37.29348016920314]
We present a new framework leveraging off-the-shelf generative models to generate synthetic training images.
We address class name ambiguity, lack of diversity in naive prompts, and domain shifts.
Our framework consistently enhances recognition model performance with more synthetic data.
arXiv Detail & Related papers (2023-12-04T18:35:27Z) - Improving Generalization of Image Captioning with Unsupervised Prompt
Learning [63.26197177542422]
Generalization of Image Captioning (GeneIC) learns a domain-specific prompt vector for the target domain without requiring annotated data.
GeneIC aligns visual and language modalities with a pre-trained Contrastive Language-Image Pre-Training (CLIP) model.
arXiv Detail & Related papers (2023-08-05T12:27:01Z) - Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval [85.39613457282107]
Cross-domain nature of sketch-based image retrieval is challenging.
We present an effective Adapt and Align'' approach to address the key challenges.
Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
arXiv Detail & Related papers (2023-05-09T03:10:15Z) - DreamBooth: Fine Tuning Text-to-Image Diffusion Models for
Subject-Driven Generation [26.748667878221568]
We present a new approach for "personalization" of text-to-image models.
We fine-tune a pretrained text-to-image model to bind a unique identifier with that specific subject.
The unique identifier can then be used to synthesize fully photorealistic-novel images of the subject contextualized in different scenes.
arXiv Detail & Related papers (2022-08-25T17:45:49Z) - Text-Guided Synthesis of Artistic Images with Retrieval-Augmented
Diffusion Models [12.676356746752894]
We present an alternative approach based on retrieval-augmented diffusion models (RDMs)
We replace the retrieval database with a more specialized database that contains only images of a particular visual style.
This provides a novel way to prompt a general trained model after training and thereby specify a particular visual style.
arXiv Detail & Related papers (2022-07-26T16:56:51Z) - More Control for Free! Image Synthesis with Semantic Diffusion Guidance [79.88929906247695]
Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from an example image.
We introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.
We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained text-guided image synthesis.
arXiv Detail & Related papers (2021-12-10T18:55:50Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.