Few-Step Distillation for Text-to-Image Generation: A Practical Guide
- URL: http://arxiv.org/abs/2512.13006v1
- Date: Mon, 15 Dec 2025 05:58:36 GMT
- Title: Few-Step Distillation for Text-to-Image Generation: A Practical Guide
- Authors: Yifan Pu, Yizeng Han, Zhiwei Tang, Jiasheng Tang, Fan Wang, Bohan Zhuang, Gao Huang,
- Abstract summary: Diffusion distillation has dramatically accelerated class-conditional image synthesis, but its applicability to open-ended text-to-image (T2I) generation is still unclear.<n>We present the first systematic study that adapts and compares state-of-the-art distillation techniques on a strong T2I teacher model, FLUX.1-lite.
- Score: 60.99392100471019
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Diffusion distillation has dramatically accelerated class-conditional image synthesis, but its applicability to open-ended text-to-image (T2I) generation is still unclear. We present the first systematic study that adapts and compares state-of-the-art distillation techniques on a strong T2I teacher model, FLUX.1-lite. By casting existing methods into a unified framework, we identify the key obstacles that arise when moving from discrete class labels to free-form language prompts. Beyond a thorough methodological analysis, we offer practical guidelines on input scaling, network architecture, and hyperparameters, accompanied by an open-source implementation and pretrained student models. Our findings establish a solid foundation for deploying fast, high-fidelity, and resource-efficient diffusion generators in real-world T2I applications. Code is available on github.com/alibaba-damo-academy/T2I-Distill.
Related papers
- PRISM: Precision-Recall Informed Data-Free Knowledge Distillation via Generative Diffusion [4.591973713524844]
Data-free knowledge distillation (DFKD) transfers knowledge from a teacher to a student without access to the real in-distribution (ID) data.<n>While existing methods perform well on small-scale images, they suffer from mode collapse when synthesizing large-scale images.<n>We propose PRISM, a precision-recall informed method for synthesizing photorealistic images.
arXiv Detail & Related papers (2025-09-21T03:16:07Z) - AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models [58.85362281293525]
We introduce AcT2I, a benchmark designed to evaluate the performance of T2I models in generating images from action-centric prompts.<n>We experimentally validate that leading T2I models do not fare well on AcT2I.<n>We build upon this by developing a training-free, knowledge distillation technique utilizing Large Language Models to address this limitation.
arXiv Detail & Related papers (2025-09-19T16:41:39Z) - TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance [23.375320072698297]
We introduce TeEFusion, a novel and efficient distillation method that directly incorporates the guidance magnitude into the text embeddings.<n>Our method allows the student to closely mimic the teacher's performance with a far simpler and more efficient sampling strategy.<n>It achieves inference speeds up to 6$times$ faster than the teacher model, while maintaining image quality at levels comparable to those obtained through the teacher's complex sampling approach.
arXiv Detail & Related papers (2025-07-24T08:45:40Z) - DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning [53.27049077100897]
generative pre-training has been shown to yield discriminative representations, paving the way towards unified visual generation and understanding.<n>This work introduces self-conditioning, a mechanism that internally leverages the rich semantics inherent in denoising network to guide its own decoding layers.<n>Results are compelling: our method boosts both generation FID and recognition accuracy with 1% computational overhead and generalizes across diverse diffusion architectures.
arXiv Detail & Related papers (2025-05-16T08:47:16Z) - Efficient Generative Model Training via Embedded Representation Warmup [12.485320863366411]
Generative models face a fundamental challenge: they must simultaneously learn high-level semantic concepts and low-level synthesis details.<n>We propose Embedded Representation Warmup, a principled two-phase training framework.<n>Our framework achieves a 11.5$times$ speedup in 350 epochs to reach FID=1.41 compared to single-phase methods like REPA.
arXiv Detail & Related papers (2025-04-14T12:43:17Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self
On-the-fly Distillation for Dense Passage Retrieval [54.54667085792404]
We propose a novel distillation method that significantly advances cross-architecture distillation for dual-encoders.
Our method 1) introduces a self on-the-fly distillation method that can effectively distill late interaction (i.e., ColBERT) to vanilla dual-encoder, and 2) incorporates a cascade distillation process to further improve the performance with a cross-encoder teacher.
arXiv Detail & Related papers (2022-05-18T18:05:13Z) - High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis.
Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis.
Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.