Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation
- URL: http://arxiv.org/abs/2505.06995v1
- Date: Sun, 11 May 2025 14:40:51 GMT
- Title: Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation
- Authors: Md. Naimur Asif Borno, Md Sakib Hossain Shovon, Asmaa Soliman Al-Moisheer, Mohammad Ali Moni,
- Abstract summary: KDC-Diff is a novel stable diffusion framework that enhances efficiency while maintaining image quality.<n>It achieves state-of-the-art performance on the Oxford Flowers and Butterflies & Moths 100 Species datasets.
- Score: 2.4280350854512673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in text-to-image diffusion models are hindered by high computational demands, limiting accessibility and scalability. This paper introduces KDC-Diff, a novel stable diffusion framework that enhances efficiency while maintaining image quality. KDC-Diff features a streamlined U-Net architecture with nearly half the parameters of the original U-Net (482M), significantly reducing model complexity. We propose a dual-layered distillation strategy to ensure high-fidelity generation, transferring semantic and structural insights from a teacher to a compact student model while minimizing quality degradation. Additionally, replay-based continual learning is integrated to mitigate catastrophic forgetting, allowing the model to retain prior knowledge while adapting to new data. Despite operating under extremely low computational resources, KDC-Diff achieves state-of-the-art performance on the Oxford Flowers and Butterflies & Moths 100 Species datasets, demonstrating competitive metrics such as FID, CLIP, and LPIPS. Moreover, it significantly reduces inference time compared to existing models. These results establish KDC-Diff as a highly efficient and adaptable solution for text-to-image generation, particularly in computationally constrained environments.
Related papers
- Federated Learning of Low-Rank One-Shot Image Detection Models in Edge Devices with Scalable Accuracy and Compute Complexity [5.820612543019548]
LoRa-FL is designed for training low-rank one-shot image detection models deployed on edge devices.<n>By incorporating low-rank adaptation techniques into one-shot detection architectures, our method significantly reduces both computational and communication overhead.
arXiv Detail & Related papers (2025-04-23T08:40:44Z) - Ultra-Resolution Adaptation with Ease [62.56434979517156]
We propose a set of key guidelines for ultra-resolution adaptation termed emphURAE.<n>We show that tuning minor components of the weight matrices outperforms widely-used low-rank adapters when synthetic data are unavailable.<n>Experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations.
arXiv Detail & Related papers (2025-03-20T16:44:43Z) - LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization [37.236005953016175]
LightGen is an efficient training paradigm for image generation models.<n>It distills knowledge from state-of-the-art (SOTA) text-to-image models into a compact Masked Autoregressive architecture.<n>Experiments confirm that LightGen achieves image generation quality comparable to SOTA models.
arXiv Detail & Related papers (2025-03-11T16:58:02Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Improved Training Technique for Latent Consistency Models [18.617862678160243]
Consistency models are capable of producing high-quality samples in either a single step or multiple steps.<n>We analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers.<n>We introduce a diffusion loss at early timesteps and employ optimal transport (OT) coupling to further enhance performance.
arXiv Detail & Related papers (2025-02-03T15:25:58Z) - Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models [33.519892081718716]
We propose aligning the latent space with pre-trained vision foundation models when training the visual tokenizers.<n>Our proposed VA-VAE significantly expands the reconstruction-generation frontier of latent diffusion models.<n>We build an enhanced DiT baseline with improved training strategies and architecture designs, termed LightningDiT.
arXiv Detail & Related papers (2025-01-02T18:59:40Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Efficient Degradation-aware Any Image Restoration [83.92870105933679]
We propose textitDaAIR, an efficient All-in-One image restorer employing a Degradation-aware Learner (DaLe) in the low-rank regime.
By dynamically allocating model capacity to input degradations, we realize an efficient restorer integrating holistic and specific learning.
arXiv Detail & Related papers (2024-05-24T11:53:27Z) - Fixed Point Diffusion Models [13.035518953879539]
Fixed Point Diffusion Model (FPDM) is a novel approach to image generation that integrates the concept of fixed point solving into the framework of diffusion-based generative modeling.
Our approach embeds an implicit fixed point solving layer into the denoising network of a diffusion model, transforming the diffusion process into a sequence of closely-related fixed point problems.
We conduct experiments with state-of-the-art models on ImageNet, FFHQ, CelebA-HQ, and LSUN-Church, demonstrating substantial improvements in performance and efficiency.
arXiv Detail & Related papers (2024-01-16T18:55:54Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.