T-LoRA: Single Image Diffusion Model Customization Without Overfitting
- URL: http://arxiv.org/abs/2507.05964v1
- Date: Tue, 08 Jul 2025 13:14:10 GMT
- Title: T-LoRA: Single Image Diffusion Model Customization Without Overfitting
- Authors: Vera Soboleva, Aibek Alanov, Andrey Kuznetsov, Konstantin Sobolev,
- Abstract summary: This paper tackles the challenging yet most impactful task of adapting a diffusion model using just a single concept image.<n>We introduce T-LoRA, a Timestep-Dependent Low-Rank Adaptation framework specifically designed for diffusion model personalization.<n>We show that higher diffusion timesteps are more prone to overfitting than lower ones, necessitating a timestep-sensitive fine-tuning strategy.
- Score: 2.424910201171407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While diffusion model fine-tuning offers a powerful approach for customizing pre-trained models to generate specific objects, it frequently suffers from overfitting when training samples are limited, compromising both generalization capability and output diversity. This paper tackles the challenging yet most impactful task of adapting a diffusion model using just a single concept image, as single-image customization holds the greatest practical potential. We introduce T-LoRA, a Timestep-Dependent Low-Rank Adaptation framework specifically designed for diffusion model personalization. In our work we show that higher diffusion timesteps are more prone to overfitting than lower ones, necessitating a timestep-sensitive fine-tuning strategy. T-LoRA incorporates two key innovations: (1) a dynamic fine-tuning strategy that adjusts rank-constrained updates based on diffusion timesteps, and (2) a weight parametrization technique that ensures independence between adapter components through orthogonal initialization. Extensive experiments show that T-LoRA and its individual components outperform standard LoRA and other diffusion model personalization techniques. They achieve a superior balance between concept fidelity and text alignment, highlighting the potential of T-LoRA in data-limited and resource-constrained scenarios. Code is available at https://github.com/ControlGenAI/T-LoRA.
Related papers
- Zero-Shot Adaptation of Parameter-Efficient Fine-Tuning in Diffusion Models [48.22550575107633]
We introduce ProLoRA, enabling zero-shot adaptation of parameter-efficient fine-tuning in text-to-image diffusion models.<n>ProLoRA transfers pre-trained low-rank adjustments from a source to a target model without additional training data.
arXiv Detail & Related papers (2025-05-29T20:37:04Z) - LoRACLR: Contrastive Adaptation for Customization of Diffusion Models [62.70911549650579]
LoRACLR is a novel approach for multi-concept image generation that merges multiple LoRA models, each fine-tuned for a distinct concept, into a single, unified model.<n>LoRACLR uses a contrastive objective to align and merge the weight spaces of these models, ensuring compatibility while minimizing interference.<n>Our results highlight the effectiveness of LoRACLR in accurately merging multiple concepts, advancing the capabilities of personalized image generation.
arXiv Detail & Related papers (2024-12-12T18:59:55Z) - LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization [0.0]
Low-Rank Adaptation (LoRA) and other parameter-efficient fine-tuning (PEFT) methods provide low-memory, storage-efficient solutions for personalizing text-to-image models.<n>We show that training a hypernetwork model to generate LoRA weights can achieve competitive quality for specific domains.
arXiv Detail & Related papers (2024-12-03T10:17:15Z) - Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs [76.40876036912537]
Large Language Models (LLMs) demonstrate strong few-shot adaptability without requiring fine-tuning.<n>Current Visual Foundation Models (VFMs) require explicit fine-tuning with sufficient tuning data.<n>We propose a framework, LoRA Recycle, that distills a meta-LoRA from diverse pre-tuned LoRAs with a meta-learning objective.
arXiv Detail & Related papers (2024-12-03T07:25:30Z) - LoRA vs Full Fine-tuning: An Illusion of Equivalence [76.11938177294178]
We study how Low-Rank Adaptation (LoRA) and full-finetuning change pre-trained models.<n>We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure.<n>We extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension.
arXiv Detail & Related papers (2024-10-28T17:14:01Z) - AutoLoRA: AutoGuidance Meets Low-Rank Adaptation for Diffusion Models [0.9514837871243403]
Low-rank adaptation (LoRA) is a fine-tuning technique that can be applied to conditional generative diffusion models.
We introduce AutoLoRA, a novel guidance technique for diffusion models fine-tuned with the LoRA approach.
arXiv Detail & Related papers (2024-10-04T21:57:11Z) - DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion [43.55179971287028]
We propose DiffLoRA, an efficient method that leverages the diffusion model as a hypernetwork to predict personalized Low-Rank Adaptation weights.
By incorporating these LoRA weights into the off-the-shelf text-to-image model, DiffLoRA enables zero-shot personalization during inference.
We introduce a novel identity-oriented LoRA weights construction pipeline to facilitate the training process of DiffLoRA.
arXiv Detail & Related papers (2024-08-13T09:00:35Z) - PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction [38.424899483761656]
PaRa is an effective and efficient Rank Reduction approach for T2I model personalization.
Our design is motivated by the fact that taming a T2I model toward a novel concept implies a small generation space.
We show that PaRa achieves great advantages over existing finetuning approaches on single/multi-subject generation as well as single-image editing.
arXiv Detail & Related papers (2024-06-09T04:51:51Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation [69.72194342962615]
We introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient?
First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch.
Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model.
Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time.
arXiv Detail & Related papers (2024-01-11T18:59:14Z) - FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers [72.83770102062141]
Vision Transformer (ViT) model has gradually become mainstream in various computer vision tasks.<n>Existing large models tend to prioritize performance during training, potentially neglecting the robustness.<n>We develop novel LNLoRA module, incorporating a learnable layer normalization before the conventional LoRA module.<n>We propose the FullLoRA framework by integrating the learnable LNLoRA modules into all key components of ViT-based models.
arXiv Detail & Related papers (2024-01-03T14:08:39Z) - Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA [64.10981296843609]
We show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new concepts arrive sequentially.
We propose a new method, C-LoRA, composed of a continually self-regularized low-rank adaptation in cross attention layers of the popular Stable Diffusion model.
We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification.
arXiv Detail & Related papers (2023-04-12T17:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.