Related papers: Learning Efficient GANs for Image Translation via Differentiable Masks and co-Attention Distillation

Learning Efficient GANs for Image Translation via Differentiable Masks and co-Attention Distillation

URL: http://arxiv.org/abs/2011.08382v4
Date: Wed, 2 Mar 2022 09:17:14 GMT
Title: Learning Efficient GANs for Image Translation via Differentiable Masks and co-Attention Distillation
Authors: Shaojie Li, Mingbao Lin, Yan Wang, Fei Chao, Ling Shao, Rongrong Ji
Abstract summary: Generative Adversarial Networks (GANs) have been widely-used in image translation, but their high computation and storage costs impede the deployment on mobile devices. We introduce a novel GAN compression method, termed DMAD, by proposing a Differentiable Mask and a co-Attention Distillation. Experiments show DMAD can reduce the Multiply Accumulate Operations (MACs) of CycleGAN by 13x and that of Pix2Pix by 4x while retaining a comparable performance against the full model.
Score: 130.30465659190773
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative Adversarial Networks (GANs) have been widely-used in image translation, but their high computation and storage costs impede the deployment on mobile devices. Prevalent methods for CNN compression cannot be directly applied to GANs due to the peculiarties of GAN tasks and the unstable adversarial training. To solve these, in this paper, we introduce a novel GAN compression method, termed DMAD, by proposing a Differentiable Mask and a co-Attention Distillation. The former searches for a light-weight generator architecture in a training-adaptive manner. To overcome channel inconsistency when pruning the residual connections, an adaptive cross-block group sparsity is further incorporated. The latter simultaneously distills informative attention maps from both the generator and discriminator of a pre-trained model to the searched generator, effectively stabilizing the adversarial training of our light-weight model. Experiments show that DMAD can reduce the Multiply Accumulate Operations (MACs) of CycleGAN by 13x and that of Pix2Pix by 4x while retaining a comparable performance against the full model. Our code can be available at https://github.com/SJLeo/DMAD.

Related papers

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model. We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch. Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z)
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer [95.80384464922147]
ACDiT is a blockwise Conditional Diffusion Transformer. It offers a flexible between token-wise autoregression and full-sequence diffusion. We show that ACDiT performs best among all autoregressive baselines on image and video generation tasks.
arXiv Detail & Related papers (2024-12-10T18:13:20Z)
Unified Auto-Encoding with Masked Diffusion [15.264296748357157]
We propose a unified self-supervised objective, dubbed Unified Masked Diffusion (UMD) UMD combines patch-based and noise-based corruption techniques within a single auto-encoding framework. It achieves strong performance in downstream generative and representation learning tasks.
arXiv Detail & Related papers (2024-06-25T16:24:34Z)
Improved Distribution Matching Distillation for Fast Image Synthesis [54.72356560597428]
We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. First, we eliminate the regression loss and the need for expensive dataset construction. Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images.
arXiv Detail & Related papers (2024-05-23T17:59:49Z)
Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model. For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z)
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer [102.39050180060913]
Diffusion Transformer (DiT) has emerged as the new trend of generative diffusion models on image generation. Recent breakthroughs have been driven by mask strategy that significantly improves the training efficiency of DiT with additional intra-image contextual learning. In this work, we address these limitations by novelly unleashing the self-supervised discrimination knowledge to boost DiT training.
arXiv Detail & Related papers (2024-03-25T17:59:35Z)
Generalized Consistency Trajectory Models for Image Manipulation [59.576781858809355]
Diffusion models (DMs) excel in unconditional generation, as well as on applications such as image editing and restoration. This work aims to unlock the full potential of consistency trajectory models (CTMs) by proposing generalized CTMs (GCTMs) We discuss the design space of GCTMs and demonstrate their efficacy in various image manipulation tasks such as image-to-image translation, restoration, and editing.
arXiv Detail & Related papers (2024-03-19T07:24:54Z)
DuDGAN: Improving Class-Conditional GANs via Dual-Diffusion [2.458437232470188]
Class-conditional image generation using generative adversarial networks (GANs) has been investigated through various techniques. We propose a novel approach for class-conditional image generation using GANs called DuDGAN, which incorporates a dual diffusion-based noise injection process. Our method outperforms state-of-the-art conditional GAN models for image generation in terms of performance.
arXiv Detail & Related papers (2023-05-24T07:59:44Z)
MSGDD-cGAN: Multi-Scale Gradients Dual Discriminator Conditional Generative Adversarial Network [14.08122854653421]
MSGDD-cGAN is proposed to stabilize the performance of Conditional Generative Adversarial Networks (cGANs) Our model shows a 3.18% increase in the F1 score comparing to the pix2pix version of cGANs.
arXiv Detail & Related papers (2021-09-12T21:08:37Z)
Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem. We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z)
GAN Compression: Efficient Architectures for Interactive Conditional GANs [45.012173624111185]
Recent Conditional Generative Adversarial Networks (cGANs) are 1-2 orders of magnitude more compute-intensive than modern recognition CNNs. We propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs.
arXiv Detail & Related papers (2020-03-19T17:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.