Mind the Gap in Distilling StyleGANs
- URL: http://arxiv.org/abs/2208.08840v1
- Date: Thu, 18 Aug 2022 14:18:29 GMT
- Title: Mind the Gap in Distilling StyleGANs
- Authors: Guodong Xu, Yuenan Hou, Ziwei Liu, Chen Change Loy
- Abstract summary: StyleGAN family is one of the most popular Generative Adversarial Networks (GANs) for unconditional generation.
This paper provides a comprehensive study of distilling from the popular StyleGAN-like architecture.
- Score: 100.58444291751015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: StyleGAN family is one of the most popular Generative Adversarial Networks
(GANs) for unconditional generation. Despite its impressive performance, its
high demand on storage and computation impedes their deployment on
resource-constrained devices. This paper provides a comprehensive study of
distilling from the popular StyleGAN-like architecture. Our key insight is that
the main challenge of StyleGAN distillation lies in the output discrepancy
issue, where the teacher and student model yield different outputs given the
same input latent code. Standard knowledge distillation losses typically fail
under this heterogeneous distillation scenario. We conduct thorough analysis
about the reasons and effects of this discrepancy issue, and identify that the
mapping network plays a vital role in determining semantic information of
generated images. Based on this finding, we propose a novel initialization
strategy for the student model, which can ensure the output consistency to the
maximum extent. To further enhance the semantic consistency between the teacher
and student model, we present a latent-direction-based distillation loss that
preserves the semantic relations in latent space. Extensive experiments
demonstrate the effectiveness of our approach in distilling StyleGAN2 and
StyleGAN3, outperforming existing GAN distillation methods by a large margin.
Related papers
- Knowledge Distillation with Refined Logits [31.205248790623703]
We introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods.
Our approach is motivated by the observation that even high-performing teacher models can make incorrect predictions.
Our method can effectively eliminate misleading information from the teacher while preserving crucial class correlations.
arXiv Detail & Related papers (2024-08-14T17:59:32Z) - One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - Multi-Granularity Semantic Revision for Large Language Model Distillation [66.03746866578274]
We propose a multi-granularity semantic revision method for LLM distillation.
At the sequence level, we propose a sequence correction and re-generation strategy.
At the token level, we design a distribution adaptive clipping Kullback-Leibler loss as the distillation objective function.
At the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent.
arXiv Detail & Related papers (2024-07-14T03:51:49Z) - Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection [19.099643719358692]
We propose a simple yet effective two-stage industrial anomaly detection framework, termed as AAND.
In the first anomaly amplification stage, we propose a novel Residual Anomaly Amplification (RAA) module to advance the pre-trained teacher encoder.
We further employ a reverse distillation paradigm to train a student decoder, in which a novel Hard Knowledge Distillation (HKD) loss is built to better facilitate the reconstruction of normal patterns.
arXiv Detail & Related papers (2024-05-03T13:00:22Z) - HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
Transformers [49.79405257763856]
This paper focuses on task-agnostic distillation.
It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints.
We propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning.
arXiv Detail & Related papers (2023-02-19T17:37:24Z) - Normalized Feature Distillation for Semantic Segmentation [6.882655287146012]
We propose a simple yet effective feature distillation method called normalized feature distillation (NFD)
Our method achieves state-of-the-art distillation results for semantic segmentation on Cityscapes, VOC 2012, and ADE20K datasets.
arXiv Detail & Related papers (2022-07-12T01:54:25Z) - Anomaly Detection via Reverse Distillation from One-Class Embedding [2.715884199292287]
We propose a novel T-S model consisting of a teacher encoder and a student decoder.
Instead of receiving raw images directly, the student network takes teacher model's one-class embedding as input.
In addition, we introduce a trainable one-class bottleneck embedding module in our T-S model.
arXiv Detail & Related papers (2022-01-26T01:48:37Z) - Knowledge distillation via adaptive instance normalization [52.91164959767517]
We propose a new knowledge distillation method based on transferring feature statistics from the teacher to the student.
Our method goes beyond the standard way of enforcing the mean and variance of the student to be similar to those of the teacher.
We show that our distillation method outperforms other state-of-the-art distillation methods over a large set of experimental settings.
arXiv Detail & Related papers (2020-03-09T17:50:12Z) - High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis.
Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis.
Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.