Related papers: EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

URL: http://arxiv.org/abs/2510.20512v1
Date: Thu, 23 Oct 2025 12:56:33 GMT
Title: EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization
Authors: Yixiong Yang, Tao Wu, Senmao Li, Shiqi Yang, Yaxing Wang, Joost van de Weijer, Kai Wang,
Abstract summary: We propose a bidirectional concept distillation framework, EchoDistill, to enable one-step diffusion personalization.<n>Our approach involves an end-to-end training process where a multi-step diffusion model (teacher) and a one-step diffusion model (student) are trained simultaneously.<n>Our experiments demonstrate that this collaborative framework significantly outperforms existing personalization methods over the 1-SDP setup.
Score: 30.814807961528572
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in accelerating text-to-image (T2I) diffusion models have enabled the synthesis of high-fidelity images even in a single step. However, personalizing these models to incorporate novel concepts remains a challenge due to the limited capacity of one-step models to capture new concept distributions effectively. We propose a bidirectional concept distillation framework, EchoDistill, to enable one-step diffusion personalization (1-SDP). Our approach involves an end-to-end training process where a multi-step diffusion model (teacher) and a one-step diffusion model (student) are trained simultaneously. The concept is first distilled from the teacher model to the student, and then echoed back from the student to the teacher. During the EchoDistill, we share the text encoder between the two models to ensure consistent semantic understanding. Following this, the student model is optimized with adversarial losses to align with the real image distribution and with alignment losses to maintain consistency with the teacher's output. Furthermore, we introduce the bidirectional echoing refinement strategy, wherein the student model leverages its faster generation capability to feedback to the teacher model. This bidirectional concept distillation mechanism not only enhances the student ability to personalize novel concepts but also improves the generative quality of the teacher model. Our experiments demonstrate that this collaborative framework significantly outperforms existing personalization methods over the 1-SDP setup, establishing a novel paradigm for rapid and effective personalization in T2I diffusion models.

Related papers

Towards One-step Causal Video Generation via Adversarial Self-Distillation [71.30373662465648]
Recent hybrid video generation models combine autoregressive temporal dynamics with diffusion-based spatial denoising.<n>Our framework produces a single distilled model that flexibly supports multiple inference-step settings.
arXiv Detail & Related papers (2025-11-03T10:12:47Z)
DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning [53.27049077100897]
generative pre-training has been shown to yield discriminative representations, paving the way towards unified visual generation and understanding.<n>This work introduces self-conditioning, a mechanism that internally leverages the rich semantics inherent in denoising network to guide its own decoding layers.<n>Results are compelling: our method boosts both generation FID and recognition accuracy with 1% computational overhead and generalizes across diverse diffusion architectures.
arXiv Detail & Related papers (2025-05-16T08:47:16Z)
SYNTHIA: Novel Concept Design with Affordance Composition [114.19366716161655]
We introduce SYNTHIA, a framework for generating novel, functionally coherent designs based on desired affordances.<n>We develop a curriculum learning scheme based on our ontology that contrast fine-tunes T2I models to progressively learn affordance composition.<n> Experimental results show that SYNTHIA outperforms state-of-the-art T2I models.
arXiv Detail & Related papers (2025-02-25T02:54:11Z)
Towards Training One-Step Diffusion Models Without Distillation [72.80423908458772]
We introduce a family of new training methods that entirely forgo teacher score supervision.<n>We find that initializing the student model with the teacher's weights remains critical.
arXiv Detail & Related papers (2025-02-11T23:02:14Z)
OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs [24.046764908874703]
OFTSR is a flow-based framework for one-step image super-resolution that can produce outputs with tunable levels of fidelity and realism.<n>We demonstrate that OFTSR achieves state-of-the-art performance for one-step image super-resolution, while having the ability to flexibly tune the fidelity-realism trade-off.
arXiv Detail & Related papers (2024-12-12T17:14:58Z)
AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization [3.5066393042242123]
We propose AttenCraft, an attention-based method for multiple-concept disentanglement.<n>We introduce an adaptive algorithm based on attention scores to estimate sampling ratios for different concepts.<n>Our model effectively mitigates two issues, achieving state-of-the-art image fidelity and comparable prompt fidelity to baseline models.
arXiv Detail & Related papers (2024-05-28T08:50:14Z)
SFDDM: Single-fold Distillation for Diffusion models [4.688721356965585]
We propose a single-fold distillation algorithm, SFDDM, which can flexibly compress the teacher diffusion model into a student model of any desired step. Experiments on four datasets demonstrate that SFDDM is able to sample high-quality data with steps reduced to as little as approximately 1%.
arXiv Detail & Related papers (2024-05-23T18:11:14Z)
Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z)
Delta Distillation for Efficient Video Processing [68.81730245303591]
We propose a novel knowledge distillation schema coined as Delta Distillation. We demonstrate that these temporal variations can be effectively distilled due to the temporal redundancies within video frames. As a by-product, delta distillation improves the temporal consistency of the teacher model.
arXiv Detail & Related papers (2022-03-17T20:13:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.