E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
- URL: http://arxiv.org/abs/2401.06127v2
- Date: Mon, 3 Jun 2024 02:09:38 GMT
- Title: E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
- Authors: Yifan Gong, Zheng Zhan, Qing Jin, Yanyu Li, Yerlan Idelbayev, Xian Liu, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren,
- Abstract summary: We introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient?
First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch.
Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model.
Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time.
- Score: 69.72194342962615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models to generate paired datasets used for training generative adversarial networks (GANs). This approach notably alleviates the stringent requirements typically imposed by high-end commercial GPUs for performing image editing with diffusion models. However, unlike text-to-image diffusion models, each distilled GAN is specialized for a specific image editing task, necessitating costly training efforts to obtain models for various concepts. In this work, we introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient? To achieve this goal, we propose a series of innovative techniques. First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch. Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model. Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time. Extensive experiments show that we can efficiently empower GANs with the ability to perform real-time high-quality image editing on mobile devices with remarkably reduced training and storage costs for each concept.
Related papers
- One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - UGC: Unified GAN Compression for Efficient Image-to-Image Translation [20.3126581529643]
We propose a new learning paradigm, Unified GAN Compression (UGC), with a unified objective to seamlessly prompt the synergy of model-efficient and label-efficient learning.
We formulate a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient and performance-excellent model.
arXiv Detail & Related papers (2023-09-17T15:55:09Z) - A Simple and Effective Baseline for Attentional Generative Adversarial
Networks [8.63558211869045]
A text-to-image model of high-quality images by guiding the generative model through the Text description is an innovative and challenging task.
In recent years, AttnGAN based on the Attention mechanism to guide GAN training has been proposed, SD-GAN, and Stack-GAN++.
We use the popular simple and effective idea (1) to remove redundancy structure and improve the backbone network of AttnGAN.
Our improvements have significantly improved the model size and training efficiency while ensuring that the model's performance is unchanged.
arXiv Detail & Related papers (2023-06-26T13:55:57Z) - SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two
Seconds [88.06788636008051]
Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers.
These models are large, with complex network architectures and tens of denoising iterations, making them computationally expensive and slow to run.
We present a generic approach that unlocks running text-to-image diffusion models on mobile devices in less than $2$ seconds.
arXiv Detail & Related papers (2023-06-01T17:59:25Z) - Is This Loss Informative? Faster Text-to-Image Customization by Tracking
Objective Dynamics [31.15864240403093]
We study the training dynamics of popular text-to-image personalization methods, aiming to speed them up.
We propose a simple drop-in early stopping criterion that only requires computing the regular training objective on a fixed set of inputs.
Our experiments on Stable Diffusion for 48 different concepts and three personalization methods demonstrate the competitive performance of our approach.
arXiv Detail & Related papers (2023-02-09T18:49:13Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Towards Faster and Stabilized GAN Training for High-fidelity Few-shot
Image Synthesis [21.40315235087551]
We propose a light-weight GAN structure that gains superior quality on 1024*1024 resolution.
We show our model's superior performance compared to the state-of-the-art StyleGAN2, when data and computing budget are limited.
arXiv Detail & Related papers (2021-01-12T22:02:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.