Related papers: TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance

TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance

URL: http://arxiv.org/abs/2507.18192v2
Date: Fri, 25 Jul 2025 03:17:01 GMT
Title: TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance
Authors: Minghao Fu, Guo-Hua Wang, Xiaohao Chen, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang,
Abstract summary: We introduce TeEFusion, a novel and efficient distillation method that directly incorporates the guidance magnitude into the text embeddings.<n>Our method allows the student to closely mimic the teacher's performance with a far simpler and more efficient sampling strategy.<n>It achieves inference speeds up to 6$times$ faster than the teacher model, while maintaining image quality at levels comparable to those obtained through the teacher's complex sampling approach.
Score: 23.375320072698297
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in text-to-image synthesis largely benefit from sophisticated sampling strategies and classifier-free guidance (CFG) to ensure high-quality generation. However, CFG's reliance on two forward passes, especially when combined with intricate sampling algorithms, results in prohibitively high inference costs. To address this, we introduce TeEFusion (Text Embeddings Fusion), a novel and efficient distillation method that directly incorporates the guidance magnitude into the text embeddings and distills the teacher model's complex sampling strategy. By simply fusing conditional and unconditional text embeddings using linear operations, TeEFusion reconstructs the desired guidance without adding extra parameters, simultaneously enabling the student model to learn from the teacher's output produced via its sophisticated sampling approach. Extensive experiments on state-of-the-art models such as SD3 demonstrate that our method allows the student to closely mimic the teacher's performance with a far simpler and more efficient sampling strategy. Consequently, the student model achieves inference speeds up to 6$\times$ faster than the teacher model, while maintaining image quality at levels comparable to those obtained through the teacher's complex sampling approach. The code is publicly available at https://github.com/AIDC-AI/TeEFusion.

Related papers

SMOTExT: SMOTE meets Large Language Models [19.394116388173885]
We propose a novel technique, SMOTExT, that adapts the idea of Synthetic Minority Over-sampling (SMOTE) to textual data.<n>Our method generates new synthetic examples by interpolating between BERT-based embeddings of two existing examples.<n>In early experiments, training models solely on generated data achieved comparable performance to models trained on the original dataset.
arXiv Detail & Related papers (2025-05-19T17:57:36Z)
TD3: Tucker Decomposition Based Dataset Distillation Method for Sequential Recommendation [50.23504065567638]
This paper introduces textbfTD3, a novel textbfDataset textbfDistillation method within a meta-learning framework.<n> TD3 distills a fully expressive emphsynthetic sequence summary from original data.<n>An augmentation technique allows the learner to closely fit the synthetic summary, ensuring an accurate update of it in the emphouter-loop.
arXiv Detail & Related papers (2025-02-05T03:13:25Z)
READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data [7.152603583363887]
Pre-trained transformer models such as BERT have shown massive gains across many text classification tasks.<n>This paper proposes a method that encapsulates reinforcement learning-based text generation and semi-supervised adversarial learning approaches.<n>Our method READ, Reinforcement-based Adversarial learning, utilizes an unlabeled dataset to generate diverse synthetic text through reinforcement learning.
arXiv Detail & Related papers (2025-01-14T11:39:55Z)
A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
Hybrid Data-Free Knowledge Distillation [11.773963069904955]
We propose a data-free knowledge distillation method called textbfHybrtextbfid textbfData-textbfFree textbfDistillation (HiDFD)<n>Our HiDFD can achieve state-of-the-art performance using 120 times less collected data than existing methods.
arXiv Detail & Related papers (2024-12-18T05:52:16Z)
KL-geodesics flow matching with a novel sampling scheme [4.347494885647007]
Non-autoregressive language models generate all tokens simultaneously, offering potential speed advantages over traditional autoregressive models.<n>We investigate a conditional flow matching approach for text generation.
arXiv Detail & Related papers (2024-11-25T17:15:41Z)
Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization [67.8738082040299]
Self-Sampling Preference Optimization (SSPO) is a new alignment method for post-training reinforcement learning.<n>SSPO eliminates the need for paired data and reward models while retaining the training stability of SFT.<n>SSPO surpasses all previous approaches on the text-to-image benchmarks and demonstrates outstanding performance on the text-to-video benchmarks.
arXiv Detail & Related papers (2024-10-07T17:56:53Z)
Curriculum Direct Preference Optimization for Diffusion and Consistency Models [110.08057135882356]
We propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation.<n>Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on nine benchmarks.
arXiv Detail & Related papers (2024-05-22T13:36:48Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
Accelerating Diffusion Sampling with Classifier-based Feature Distillation [20.704675568555082]
Progressive distillation is proposed for fast sampling by progressively aligning output images of $N$-step teacher sampler with $N/2$-step student sampler. We distill teacher's sharpened feature distribution into the student with a dataset-independent classifier to improve performance. Experiments on CIFAR-10 show the superiority of our method in achieving high quality and fast sampling.
arXiv Detail & Related papers (2022-11-22T06:21:31Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)
Construct Informative Triplet with Two-stage Hard-sample Generation [6.361348748202731]
We propose a two-stage synthesis framework that produces hard samples through effective positive and negative sample generators. Our method achieves superior performance than the existing hard-sample generation algorithms. We also find that our proposed hard sample generation method combining the existing triplet mining strategies can further boost the deep metric learning performance.
arXiv Detail & Related papers (2021-12-04T06:28:25Z)
Deep Semi-supervised Knowledge Distillation for Overlapping Cervical Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation. We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining. Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.