Related papers: One-Shot Adaptation of GAN in Just One CLIP

One-Shot Adaptation of GAN in Just One CLIP

URL: http://arxiv.org/abs/2203.09301v1
Date: Thu, 17 Mar 2022 13:03:06 GMT
Title: One-Shot Adaptation of GAN in Just One CLIP
Authors: Gihyun Kwon, Jong Chul Ye
Abstract summary: We present a novel single-shot GAN adaptation method through unified CLIP space manipulations. Specifically, our model employs a two-step training strategy: reference image search in the source generator using a CLIP-guided latent optimization. We show that our model generates diverse outputs with the target texture and outperforms the baseline models both qualitatively and quantitatively.
Score: 51.188396199083336
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There are many recent research efforts to fine-tune a pre-trained generator with a few target images to generate images of a novel domain. Unfortunately, these methods often suffer from overfitting or under-fitting when fine-tuned with a single target image. To address this, here we present a novel single-shot GAN adaptation method through unified CLIP space manipulations. Specifically, our model employs a two-step training strategy: reference image search in the source generator using a CLIP-guided latent optimization, followed by generator fine-tuning with a novel loss function that imposes CLIP space consistency between the source and adapted generators. To further improve the adapted model to produce spatially consistent samples with respect to the source generator, we also propose contrastive regularization for patchwise relationships in the CLIP space. Experimental results show that our model generates diverse outputs with the target texture and outperforms the baseline models both qualitatively and quantitatively. Furthermore, we show that our CLIP space manipulation strategy allows more effective attribute editing.

Related papers

DeeCLIP: A Robust and Generalizable Transformer-Based Framework for Detecting AI-Generated Images [14.448350657613368]
DeeCLIP is a novel framework for detecting AI-generated images. It incorporates DeeFuser, a fusion module that combines high-level and low-level features. We trained exclusively on 4-class ProGAN data, DeeCLIP achieves an average accuracy of 89.90%.
arXiv Detail & Related papers (2025-04-28T15:06:28Z)
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective [52.778766190479374]
Latent-based image generative models have achieved notable success in image generation tasks. Despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation. We propose a simple but effective discrete image tokenizer to stabilize the latent space for image generative modeling.
arXiv Detail & Related papers (2024-10-16T12:13:17Z)
Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis [1.1633929083694388]
We propose a framework for enhancing few-shot detection beyond state-of-the-art generative augmentation approaches. We introduce our novel layout-aware CLIP score for sample ranking, enabling tight coupling between generated layouts and images. With our approach, a YOLOX-S baseline is boosted by more than 140%, 50%, 35% in mAP on the COCO 5-,10-, and 30-shot settings.
arXiv Detail & Related papers (2024-10-09T12:57:45Z)
CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [17.614980614656407]
We propose Continual Generative training for Incremental prompt-Learning. We exploit Variational Autoencoders to learn class-conditioned distributions. We show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities.
arXiv Detail & Related papers (2024-07-22T16:51:28Z)
Compressing Image-to-Image Translation GANs Using Local Density Structures on Their Learned Manifold [69.33930972652594]
Generative Adversarial Networks (GANs) have shown remarkable success in modeling complex data distributions for image-to-image translation. Existing GAN compression methods mainly rely on knowledge distillation or convolutional classifiers' pruning techniques. We propose a new approach by explicitly encouraging the pruned model to preserve the density structure of the original parameter-heavy model on its learned manifold. Our experiments on image translation GAN models, Pix2Pix and CycleGAN, with various benchmark datasets and architectures demonstrate our method's effectiveness.
arXiv Detail & Related papers (2023-12-22T15:43:12Z)
Bridging CLIP and StyleGAN through Latent Alignment for Image Editing [33.86698044813281]
We bridge CLIP and StyleGAN to achieve inference-time optimization-free diverse manipulation direction mining. With this mapping scheme, we can achieve GAN inversion, text-to-image generation and text-driven image manipulation.
arXiv Detail & Related papers (2022-10-10T09:17:35Z)
Towards Diverse and Faithful One-shot Adaption of Generative Adversarial Networks [54.80435295622583]
One-shot generative domain adaption aims to transfer a pre-trained generator on one domain to a new domain using one reference image only. We present a novel one-shot generative domain adaption method, i.e., DiFa, for diverse generation and faithful adaptation.
arXiv Detail & Related papers (2022-07-18T16:29:41Z)
FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images. FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales. In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z)
Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment [130.84010267004803]
Training a generative adversarial network (GAN) with limited data has been a challenging task. A feasible solution is to start with a GAN well-trained on a large scale source domain and adapt it to the target domain with a few samples, termed as few shot generative model adaption. We propose a relaxed spatial structural alignment method to calibrate the target generative models during the adaption.
arXiv Detail & Related papers (2022-03-06T14:26:25Z)
Optimizing Generative Adversarial Networks for Image Super Resolution via Latent Space Regularization [4.529132742139768]
Generative Adversarial Networks (GANs) try to learn the distribution of the real images in the manifold to generate samples that look real. We probe for ways to alleviate these problems for supervised GANs in this paper.
arXiv Detail & Related papers (2020-01-22T16:27:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.