StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
- URL: http://arxiv.org/abs/2108.00946v1
- Date: Mon, 2 Aug 2021 14:46:46 GMT
- Title: StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
- Authors: Rinon Gal, Or Patashnik, Haggai Maron, Gal Chechik, Daniel Cohen-Or
- Abstract summary: We present a text-driven method that allows shifting a generative model to new domains.
We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains.
- Score: 63.85888518950824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Can a generative model be trained to produce images from a specific domain,
guided by a text prompt only, without seeing any image? In other words: can an
image generator be trained blindly? Leveraging the semantic power of large
scale Contrastive-Language-Image-Pre-training (CLIP) models, we present a
text-driven method that allows shifting a generative model to new domains,
without having to collect even a single image from those domains. We show that
through natural language prompts and a few minutes of training, our method can
adapt a generator across a multitude of domains characterized by diverse styles
and shapes. Notably, many of these modifications would be difficult or outright
impossible to reach with existing methods. We conduct an extensive set of
experiments and comparisons across a wide range of domains. These demonstrate
the effectiveness of our approach and show that our shifted models maintain the
latent-space properties that make generative models appealing for downstream
tasks.
Related papers
- Diffusion Self-Guidance for Controllable Image Generation [106.59989386924136]
Self-guidance provides greater control over generated images by guiding the internal representations of diffusion models.
We show how a simple set of properties can be composed to perform challenging image manipulations.
We also show that self-guidance can be used to edit real images.
arXiv Detail & Related papers (2023-06-01T17:59:56Z) - Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models.
Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z) - Diffusion Guided Domain Adaptation of Image Generators [22.444668833151677]
We show that the classifier-free guidance can be leveraged as a critic and enable generators to distill knowledge from large-scale text-to-image diffusion models.
Generators can be efficiently shifted into new domains indicated by text prompts without access to groundtruth samples from target domains.
Although not trained to minimize CLIP loss, our model achieves equally high CLIP scores and significantly lower FID than prior work on short prompts.
arXiv Detail & Related papers (2022-12-08T18:46:19Z) - Using Language to Extend to Unseen Domains [81.37175826824625]
It is expensive to collect training data for every possible domain that a vision model may encounter when deployed.
We consider how simply verbalizing the training domain as well as domains we want to extend to but do not have data for can improve robustness.
Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain.
arXiv Detail & Related papers (2022-10-18T01:14:02Z) - LDEdit: Towards Generalized Text Guided Image Manipulation via Latent
Diffusion Models [12.06277444740134]
generic image manipulation using a single model with flexible text inputs is highly desirable.
Recent work addresses this task by guiding generative models trained on the generic image using pretrained vision-language encoders.
We propose an optimization-free method for the task of generic image manipulation from text prompts.
arXiv Detail & Related papers (2022-10-05T13:26:15Z) - Towards Diverse and Faithful One-shot Adaption of Generative Adversarial
Networks [54.80435295622583]
One-shot generative domain adaption aims to transfer a pre-trained generator on one domain to a new domain using one reference image only.
We present a novel one-shot generative domain adaption method, i.e., DiFa, for diverse generation and faithful adaptation.
arXiv Detail & Related papers (2022-07-18T16:29:41Z) - Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with
Learned Morph Maps [94.10535575563092]
We introduce a generative adversarial network that can simultaneously generate aligned image samples from multiple related domains.
We propose Polymorphic-GAN which learns shared features across all domains and a per-domain morph layer to morph shared features according to each domain.
arXiv Detail & Related papers (2022-06-06T21:03:02Z) - Network-to-Network Translation with Conditional Invertible Neural
Networks [19.398202091883366]
Recent work suggests that the power of massive machine learning models is captured by the representations they learn.
We seek a model that can relate between different existing representations and propose to solve this task with a conditionally invertible network.
Our domain transfer network can translate between fixed representations without having to learn or finetune them.
arXiv Detail & Related papers (2020-05-27T18:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.