Mind the Gap in Distilling StyleGANs
- URL: http://arxiv.org/abs/2208.08840v1
- Date: Thu, 18 Aug 2022 14:18:29 GMT
- Title: Mind the Gap in Distilling StyleGANs
- Authors: Guodong Xu, Yuenan Hou, Ziwei Liu, Chen Change Loy
- Abstract summary: StyleGAN family is one of the most popular Generative Adversarial Networks (GANs) for unconditional generation.
This paper provides a comprehensive study of distilling from the popular StyleGAN-like architecture.
- Score: 100.58444291751015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: StyleGAN family is one of the most popular Generative Adversarial Networks
(GANs) for unconditional generation. Despite its impressive performance, its
high demand on storage and computation impedes their deployment on
resource-constrained devices. This paper provides a comprehensive study of
distilling from the popular StyleGAN-like architecture. Our key insight is that
the main challenge of StyleGAN distillation lies in the output discrepancy
issue, where the teacher and student model yield different outputs given the
same input latent code. Standard knowledge distillation losses typically fail
under this heterogeneous distillation scenario. We conduct thorough analysis
about the reasons and effects of this discrepancy issue, and identify that the
mapping network plays a vital role in determining semantic information of
generated images. Based on this finding, we propose a novel initialization
strategy for the student model, which can ensure the output consistency to the
maximum extent. To further enhance the semantic consistency between the teacher
and student model, we present a latent-direction-based distillation loss that
preserves the semantic relations in latent space. Extensive experiments
demonstrate the effectiveness of our approach in distilling StyleGAN2 and
StyleGAN3, outperforming existing GAN distillation methods by a large margin.
Related papers
- Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective [52.25797439810419]
Existing defenses focus exclusively on text-based distillation, leaving the important logit-based distillation largely unexplored.<n>We characterize distillation-relevant information in teacher outputs using the conditional mutual information (CMI) between teacher logits and input queries conditioned on ground-truth labels.<n>We derive a CMI-inspired anti-distillation objective to optimize this transformation, which effectively removes distillation-relevant information while preserving output utility.
arXiv Detail & Related papers (2026-02-03T11:16:59Z) - Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning [48.041170200238206]
We introduce DASD-4B-Thinking, a lightweight yet highly capable, fully open-source reasoning model.<n>It achieves SOTA performance among open-source models of comparable scale across challenging benchmarks in mathematics, scientific reasoning, and code generation.
arXiv Detail & Related papers (2026-01-14T02:43:17Z) - Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield [54.328202401611264]
Diffusion model distillation has emerged as a powerful technique for creating efficient few-step and single-step generators.<n>We show that the primary driver of few-step distillation is not distribution matching, but a previously overlooked component we identify as CFG Augmentation (CA)<n>We propose principled modifications to the distillation process, such as decoupling the noise schedules for the engine and the regularizer, leading to further performance gains.
arXiv Detail & Related papers (2025-11-27T18:24:28Z) - From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model [18.782919607372328]
Trajectory-based and distribution-based step distillation methods offer solutions.<n>Trajectory-based methods preserve global structure but act as a "lossy compressor"<n>We recast them into synergistic components within our novel Hierarchical Distillation framework.
arXiv Detail & Related papers (2025-11-12T03:12:06Z) - Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only.
Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z) - UNDO: Understanding Distillation as Optimization [9.100811514331498]
We introduce the UNDO: UNderstanding Distillation as Optimization framework.
Each iteration directly targets the student's learning deficiencies, motivating the teacher to provide tailored and enhanced rationales.
Empirical evaluations on various challenging mathematical and commonsense reasoning tasks demonstrate that our iterative distillation method, UNDO, significantly outperforms standard one-step distillation methods.
arXiv Detail & Related papers (2025-04-03T12:18:51Z) - Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation [82.39763984380625]
We introduce denoising score distillation (DSD), a surprisingly effective and novel approach for training high-quality generative models from low-quality data.
DSD pretrains a diffusion model exclusively on noisy, corrupted samples and then distills it into a one-step generator capable of producing refined, clean outputs.
arXiv Detail & Related papers (2025-03-10T17:44:46Z) - Knowledge Distillation with Refined Logits [31.205248790623703]
We introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods.
Our approach is motivated by the observation that even high-performing teacher models can make incorrect predictions.
Our method can effectively eliminate misleading information from the teacher while preserving crucial class correlations.
arXiv Detail & Related papers (2024-08-14T17:59:32Z) - One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - Multi-Granularity Semantic Revision for Large Language Model Distillation [66.03746866578274]
We propose a multi-granularity semantic revision method for LLM distillation.
At the sequence level, we propose a sequence correction and re-generation strategy.
At the token level, we design a distribution adaptive clipping Kullback-Leibler loss as the distillation objective function.
At the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent.
arXiv Detail & Related papers (2024-07-14T03:51:49Z) - Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection [19.099643719358692]
We propose a simple yet effective two-stage industrial anomaly detection framework, termed as AAND.
In the first anomaly amplification stage, we propose a novel Residual Anomaly Amplification (RAA) module to advance the pre-trained teacher encoder.
We further employ a reverse distillation paradigm to train a student decoder, in which a novel Hard Knowledge Distillation (HKD) loss is built to better facilitate the reconstruction of normal patterns.
arXiv Detail & Related papers (2024-05-03T13:00:22Z) - HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
Transformers [49.79405257763856]
This paper focuses on task-agnostic distillation.
It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints.
We propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning.
arXiv Detail & Related papers (2023-02-19T17:37:24Z) - Normalized Feature Distillation for Semantic Segmentation [6.882655287146012]
We propose a simple yet effective feature distillation method called normalized feature distillation (NFD)
Our method achieves state-of-the-art distillation results for semantic segmentation on Cityscapes, VOC 2012, and ADE20K datasets.
arXiv Detail & Related papers (2022-07-12T01:54:25Z) - Anomaly Detection via Reverse Distillation from One-Class Embedding [2.715884199292287]
We propose a novel T-S model consisting of a teacher encoder and a student decoder.
Instead of receiving raw images directly, the student network takes teacher model's one-class embedding as input.
In addition, we introduce a trainable one-class bottleneck embedding module in our T-S model.
arXiv Detail & Related papers (2022-01-26T01:48:37Z) - Knowledge distillation via adaptive instance normalization [52.91164959767517]
We propose a new knowledge distillation method based on transferring feature statistics from the teacher to the student.
Our method goes beyond the standard way of enforcing the mean and variance of the student to be similar to those of the teacher.
We show that our distillation method outperforms other state-of-the-art distillation methods over a large set of experimental settings.
arXiv Detail & Related papers (2020-03-09T17:50:12Z) - High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis.
Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis.
Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.