Related papers: Few-Step Distillation for Text-to-Image Generation: A Practical Guide

Few-Step Distillation for Text-to-Image Generation: A Practical Guide

URL: http://arxiv.org/abs/2512.13006v1
Date: Mon, 15 Dec 2025 05:58:36 GMT
Title: Few-Step Distillation for Text-to-Image Generation: A Practical Guide
Authors: Yifan Pu, Yizeng Han, Zhiwei Tang, Jiasheng Tang, Fan Wang, Bohan Zhuang, Gao Huang,
Abstract summary: Diffusion distillation has dramatically accelerated class-conditional image synthesis, but its applicability to open-ended text-to-image (T2I) generation is still unclear.<n>We present the first systematic study that adapts and compares state-of-the-art distillation techniques on a strong T2I teacher model, FLUX.1-lite.
Score: 60.99392100471019
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Diffusion distillation has dramatically accelerated class-conditional image synthesis, but its applicability to open-ended text-to-image (T2I) generation is still unclear. We present the first systematic study that adapts and compares state-of-the-art distillation techniques on a strong T2I teacher model, FLUX.1-lite. By casting existing methods into a unified framework, we identify the key obstacles that arise when moving from discrete class labels to free-form language prompts. Beyond a thorough methodological analysis, we offer practical guidelines on input scaling, network architecture, and hyperparameters, accompanied by an open-source implementation and pretrained student models. Our findings establish a solid foundation for deploying fast, high-fidelity, and resource-efficient diffusion generators in real-world T2I applications. Code is available on github.com/alibaba-damo-academy/T2I-Distill.

Related papers

PRISM: Precision-Recall Informed Data-Free Knowledge Distillation via Generative Diffusion [4.591973713524844]
Data-free knowledge distillation (DFKD) transfers knowledge from a teacher to a student without access to the real in-distribution (ID) data.<n>While existing methods perform well on small-scale images, they suffer from mode collapse when synthesizing large-scale images.<n>We propose PRISM, a precision-recall informed method for synthesizing photorealistic images.
arXiv Detail & Related papers (2025-09-21T03:16:07Z)
AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models [58.85362281293525]
We introduce AcT2I, a benchmark designed to evaluate the performance of T2I models in generating images from action-centric prompts.<n>We experimentally validate that leading T2I models do not fare well on AcT2I.<n>We build upon this by developing a training-free, knowledge distillation technique utilizing Large Language Models to address this limitation.
arXiv Detail & Related papers (2025-09-19T16:41:39Z)
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance [23.375320072698297]
We introduce TeEFusion, a novel and efficient distillation method that directly incorporates the guidance magnitude into the text embeddings.<n>Our method allows the student to closely mimic the teacher's performance with a far simpler and more efficient sampling strategy.<n>It achieves inference speeds up to 6$times$ faster than the teacher model, while maintaining image quality at levels comparable to those obtained through the teacher's complex sampling approach.
arXiv Detail & Related papers (2025-07-24T08:45:40Z)
DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning [53.27049077100897]
generative pre-training has been shown to yield discriminative representations, paving the way towards unified visual generation and understanding.<n>This work introduces self-conditioning, a mechanism that internally leverages the rich semantics inherent in denoising network to guide its own decoding layers.<n>Results are compelling: our method boosts both generation FID and recognition accuracy with 1% computational overhead and generalizes across diverse diffusion architectures.
arXiv Detail & Related papers (2025-05-16T08:47:16Z)
Efficient Generative Model Training via Embedded Representation Warmup [12.485320863366411]
Generative models face a fundamental challenge: they must simultaneously learn high-level semantic concepts and low-level synthesis details.<n>We propose Embedded Representation Warmup, a principled two-phase training framework.<n>Our framework achieves a 11.5$times$ speedup in 350 epochs to reach FID=1.41 compared to single-phase methods like REPA.
arXiv Detail & Related papers (2025-04-14T12:43:17Z)
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process. We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z)
ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval [54.54667085792404]
We propose a novel distillation method that significantly advances cross-architecture distillation for dual-encoders. Our method 1) introduces a self on-the-fly distillation method that can effectively distill late interaction (i.e., ColBERT) to vanilla dual-encoder, and 2) incorporates a cascade distillation process to further improve the performance with a cross-encoder teacher.
arXiv Detail & Related papers (2022-05-18T18:05:13Z)
High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis. Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis. Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.