Finetuning-Free Personalization of Text to Image Generation via Hypernetworks
- URL: http://arxiv.org/abs/2511.03156v1
- Date: Wed, 05 Nov 2025 03:31:33 GMT
- Title: Finetuning-Free Personalization of Text to Image Generation via Hypernetworks
- Authors: Sagar Shrestha, Gopal Sharma, Luowei Zhou, Suren Kumar,
- Abstract summary: We introduce fine-tuning-free personalization via Hypernetworks that predict LoRA-adapted weights directly from subject images.<n>Our approach achieves strong personalization performance and highlights the promise of hypernetworks as a scalable and effective direction for open-category personalization.
- Score: 15.129799519953139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalizing text-to-image diffusion models has traditionally relied on subject-specific fine-tuning approaches such as DreamBooth~\cite{ruiz2023dreambooth}, which are computationally expensive and slow at inference. Recent adapter- and encoder-based methods attempt to reduce this overhead but still depend on additional fine-tuning or large backbone models for satisfactory results. In this work, we revisit an orthogonal direction: fine-tuning-free personalization via Hypernetworks that predict LoRA-adapted weights directly from subject images. Prior hypernetwork-based approaches, however, suffer from costly data generation or unstable attempts to mimic base model optimization trajectories. We address these limitations with an end-to-end training objective, stabilized by a simple output regularization, yielding reliable and effective hypernetworks. Our method removes the need for per-subject optimization at test time while preserving both subject fidelity and prompt alignment. To further enhance compositional generalization at inference time, we introduce Hybrid-Model Classifier-Free Guidance (HM-CFG), which combines the compositional strengths of the base diffusion model with the subject fidelity of personalized models during sampling. Extensive experiments on CelebA-HQ, AFHQ-v2, and DreamBench demonstrate that our approach achieves strong personalization performance and highlights the promise of hypernetworks as a scalable and effective direction for open-category personalization.
Related papers
- HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models [23.070399327132737]
We propose a novel framework that trains a hypernetwork for efficient and effective test-time alignment.<n>Instead of modifying latent states, HyperAlign dynamically generates low-rank adaptation weights to modulate the diffusion model's generation operators.<n>It significantly outperforms existing fine-tuning and test-time scaling baselines in enhancing semantic consistency and visual appeal.
arXiv Detail & Related papers (2026-01-22T13:49:47Z) - Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization [1.1510009152620668]
Fine-tuning pre-trained generative models with Reinforcement Learning (RL) has emerged as an effective approach for aligning outputs with human preferences.<n>We show that RL-based fine-tuning is both efficient and effective for VAR models, benefiting particularly from their fast inference speeds.
arXiv Detail & Related papers (2025-05-29T10:45:38Z) - Self Distillation via Iterative Constructive Perturbations [0.2748831616311481]
We propose a novel framework that uses a cyclic optimization strategy to concurrently optimize the model and its input data for better training.<n>By alternately altering the model's parameters to the data and the data to the model, our method effectively addresses the gap between fitting and generalization.
arXiv Detail & Related papers (2025-05-20T13:15:27Z) - SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization [19.087540230261684]
Previous text-to-image diffusion models typically employ supervised fine-tuning to enhance pre-trained base models.<n>We introduce Self-sUpervised Direct preference Optimization (SUDO), a novel paradigm that optimize both fine-grained details at the pixel level and global image quality.<n>As an effective alternative to supervised fine-tuning, SUDO can be seamlessly applied to any text-to-image diffusion model.
arXiv Detail & Related papers (2025-04-20T08:18:27Z) - Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - Direct Consistency Optimization for Robust Customization of Text-to-Image Diffusion Models [67.68871360210208]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, can generate visuals with a high degree of consistency.<n>We propose a novel fine-tuning objective, dubbed Direct Consistency Optimization, which controls the deviation between fine-tuning and pretrained models.<n>We show that our approach achieves better prompt fidelity and subject fidelity than those post-optimized for merging regular fine-tuned models.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement [69.6035373784027]
Low-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models.
Previous methods may neglect the importance of a sufficient formulation of task-specific condition strategy.
We propose JoReS-Diff, a novel approach that incorporates Retinex- and semantic-based priors as the additional pre-processing condition.
arXiv Detail & Related papers (2023-12-20T08:05:57Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.