Learning Efficient GANs for Image Translation via Differentiable Masks
and co-Attention Distillation
- URL: http://arxiv.org/abs/2011.08382v4
- Date: Wed, 2 Mar 2022 09:17:14 GMT
- Title: Learning Efficient GANs for Image Translation via Differentiable Masks
and co-Attention Distillation
- Authors: Shaojie Li, Mingbao Lin, Yan Wang, Fei Chao, Ling Shao, Rongrong Ji
- Abstract summary: Generative Adversarial Networks (GANs) have been widely-used in image translation, but their high computation and storage costs impede the deployment on mobile devices.
We introduce a novel GAN compression method, termed DMAD, by proposing a Differentiable Mask and a co-Attention Distillation.
Experiments show DMAD can reduce the Multiply Accumulate Operations (MACs) of CycleGAN by 13x and that of Pix2Pix by 4x while retaining a comparable performance against the full model.
- Score: 130.30465659190773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Adversarial Networks (GANs) have been widely-used in image
translation, but their high computation and storage costs impede the deployment
on mobile devices. Prevalent methods for CNN compression cannot be directly
applied to GANs due to the peculiarties of GAN tasks and the unstable
adversarial training. To solve these, in this paper, we introduce a novel GAN
compression method, termed DMAD, by proposing a Differentiable Mask and a
co-Attention Distillation. The former searches for a light-weight generator
architecture in a training-adaptive manner. To overcome channel inconsistency
when pruning the residual connections, an adaptive cross-block group sparsity
is further incorporated. The latter simultaneously distills informative
attention maps from both the generator and discriminator of a pre-trained model
to the searched generator, effectively stabilizing the adversarial training of
our light-weight model. Experiments show that DMAD can reduce the Multiply
Accumulate Operations (MACs) of CycleGAN by 13x and that of Pix2Pix by 4x while
retaining a comparable performance against the full model. Our code can be
available at https://github.com/SJLeo/DMAD.
Related papers
- Unified Auto-Encoding with Masked Diffusion [15.264296748357157]
We propose a unified self-supervised objective, dubbed Unified Masked Diffusion (UMD)
UMD combines patch-based and noise-based corruption techniques within a single auto-encoding framework.
It achieves strong performance in downstream generative and representation learning tasks.
arXiv Detail & Related papers (2024-06-25T16:24:34Z) - Improved Distribution Matching Distillation for Fast Image Synthesis [54.72356560597428]
We introduce DMD2, a set of techniques that lift this limitation and improve DMD training.
First, we eliminate the regression loss and the need for expensive dataset construction.
Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images.
arXiv Detail & Related papers (2024-05-23T17:59:49Z) - Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer [102.39050180060913]
Diffusion Transformer (DiT) has emerged as the new trend of generative diffusion models on image generation.
Recent breakthroughs have been driven by mask strategy that significantly improves the training efficiency of DiT with additional intra-image contextual learning.
In this work, we address these limitations by novelly unleashing the self-supervised discrimination knowledge to boost DiT training.
arXiv Detail & Related papers (2024-03-25T17:59:35Z) - Generalized Consistency Trajectory Models for Image Manipulation [59.576781858809355]
Diffusion models (DMs) excel in unconditional generation, as well as on applications such as image editing and restoration.
This work aims to unlock the full potential of consistency trajectory models (CTMs) by proposing generalized CTMs (GCTMs)
We discuss the design space of GCTMs and demonstrate their efficacy in various image manipulation tasks such as image-to-image translation, restoration, and editing.
arXiv Detail & Related papers (2024-03-19T07:24:54Z) - DuDGAN: Improving Class-Conditional GANs via Dual-Diffusion [2.458437232470188]
Class-conditional image generation using generative adversarial networks (GANs) has been investigated through various techniques.
We propose a novel approach for class-conditional image generation using GANs called DuDGAN, which incorporates a dual diffusion-based noise injection process.
Our method outperforms state-of-the-art conditional GAN models for image generation in terms of performance.
arXiv Detail & Related papers (2023-05-24T07:59:44Z) - MSGDD-cGAN: Multi-Scale Gradients Dual Discriminator Conditional
Generative Adversarial Network [14.08122854653421]
MSGDD-cGAN is proposed to stabilize the performance of Conditional Generative Adversarial Networks (cGANs)
Our model shows a 3.18% increase in the F1 score comparing to the pix2pix version of cGANs.
arXiv Detail & Related papers (2021-09-12T21:08:37Z) - Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem.
We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training.
Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z) - GAN Compression: Efficient Architectures for Interactive Conditional
GANs [45.012173624111185]
Recent Conditional Generative Adversarial Networks (cGANs) are 1-2 orders of magnitude more compute-intensive than modern recognition CNNs.
We propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs.
arXiv Detail & Related papers (2020-03-19T17:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.