Customized Generation Reimagined: Fidelity and Editability Harmonized
- URL: http://arxiv.org/abs/2412.04831v1
- Date: Fri, 06 Dec 2024 07:54:34 GMT
- Title: Customized Generation Reimagined: Fidelity and Editability Harmonized
- Authors: Jian Jin, Yang Shen, Zhenyong Fu, Jian Yang,
- Abstract summary: Customized generation aims to incorporate a novel concept into a pre-trained text-to-image model.<n> customized generation suffers from an inherent trade-off between concept fidelity and editability.
- Score: 30.92739649737791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Customized generation aims to incorporate a novel concept into a pre-trained text-to-image model, enabling new generations of the concept in novel contexts guided by textual prompts. However, customized generation suffers from an inherent trade-off between concept fidelity and editability, i.e., between precisely modeling the concept and faithfully adhering to the prompts. Previous methods reluctantly seek a compromise and struggle to achieve both high concept fidelity and ideal prompt alignment simultaneously. In this paper, we propose a Divide, Conquer, then Integrate (DCI) framework, which performs a surgical adjustment in the early stage of denoising to liberate the fine-tuned model from the fidelity-editability trade-off at inference. The two conflicting components in the trade-off are decoupled and individually conquered by two collaborative branches, which are then selectively integrated to preserve high concept fidelity while achieving faithful prompt adherence. To obtain a better fine-tuned model, we introduce an Image-specific Context Optimization} (ICO) strategy for model customization. ICO replaces manual prompt templates with learnable image-specific contexts, providing an adaptive and precise fine-tuning direction to promote the overall performance. Extensive experiments demonstrate the effectiveness of our method in reconciling the fidelity-editability trade-off.
Related papers
- Sissi: Zero-shot Style-guided Image Synthesis via Semantic-style Integration [57.02757226679549]
We introduce a training-free framework that reformulates style-guided synthesis as an in-context learning task.<n>We propose a Dynamic Semantic-Style Integration (DSSI) mechanism that reweights attention between semantic and style visual tokens.<n>Experiments show that our approach achieves high-fidelity stylization with superior semantic-style balance and visual quality.
arXiv Detail & Related papers (2026-01-10T16:01:14Z) - Semantic Anchoring for Robust Personalization in Text-to-Image Diffusion Models [9.94436942959918]
A text-to-image diffusion model learns a new visual concept from a limited number of reference images.<n>We propose a semantic anchoring that guides adaptation by grounding new concepts in their corresponding distributions.<n>This anchoring encourages the model to adapt new concepts in a stable and controlled manner, expanding the pretrained distribution toward personalized regions.
arXiv Detail & Related papers (2025-11-27T09:16:33Z) - RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation [14.603824133970798]
RespoDiff is a novel framework for responsible text-to-image generation.<n>Our approach improves responsible and semantically coherent generation by 20% across diverse prompts.<n>It integrates seamlessly into large-scale models like SDXL, enhancing fairness and safety.
arXiv Detail & Related papers (2025-09-18T07:48:46Z) - AlignGen: Boosting Personalized Image Generation with Cross-Modality Prior Alignment [74.47138661595584]
We propose AlignGen, a Cross-Modality Prior Alignment mechanism for personalized image generation.<n>We show that AlignGen outperforms existing zero-shot methods and even surpasses popular test-time optimization approaches.
arXiv Detail & Related papers (2025-05-28T02:57:55Z) - DreamBoothDPO: Improving Personalized Generation using Direct Preference Optimization [2.5282283486446757]
balancing concept fidelity with contextual alignment is a challenging open problem.<n>We propose an RL-based approach that leverages the diverse outputs of T2I models to address this issue.<n>Our method eliminates the need for human-annotated scores by generating a synthetic paired dataset for DPO-like training.
arXiv Detail & Related papers (2025-05-27T10:07:50Z) - Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter [52.08332620725473]
We propose a tuning-free method for multi-concept personalization that can effectively customize both object and abstract concepts without test-time fine-tuning.<n>Our method achieves state-of-the-art performance in multi-concept personalization, supported by quantitative, qualitative, and human evaluations.
arXiv Detail & Related papers (2025-05-24T09:21:32Z) - FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation [26.585985828583304]
We propose FlipConcept, a novel approach that seamlessly integrates multiple personalized concepts into a single image.<n>We introduce guided appearance attention, mask-guided noise mixing, and background dilution to minimize concept leakage.<n>Despite not requiring tuning, our method outperforms existing models in both single and multiple personalized concept inference.
arXiv Detail & Related papers (2025-02-21T04:37:18Z) - LoRACLR: Contrastive Adaptation for Customization of Diffusion Models [62.70911549650579]
LoRACLR is a novel approach for multi-concept image generation that merges multiple LoRA models, each fine-tuned for a distinct concept, into a single, unified model.
LoRACLR uses a contrastive objective to align and merge the weight spaces of these models, ensuring compatibility while minimizing interference.
Our results highlight the effectiveness of LoRACLR in accurately merging multiple concepts, advancing the capabilities of personalized image generation.
arXiv Detail & Related papers (2024-12-12T18:59:55Z) - Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing [66.48853049746123]
We analyze reconstruction from a structural perspective and propose a novel approach that replaces traditional cross-attention with uniform attention maps.<n>Our method effectively minimizes distortions caused by varying text conditions during noise prediction.<n> Experimental results demonstrate that our approach not only excels in achieving high-fidelity image reconstruction but also performs robustly in real image composition and editing scenarios.
arXiv Detail & Related papers (2024-11-29T12:11:28Z) - DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models [7.418186319496487]
Recent text-to-image personalization methods have shown great promise in teaching a diffusion model user-specified concepts.
A promising extension is personalized editing, namely to edit an image using personalized concepts.
We propose DreamSteerer, a plug-in method for augmenting existing T2I personalization methods.
arXiv Detail & Related papers (2024-10-15T02:50:54Z) - CODE: Confident Ordinary Differential Editing [62.83365660727034]
Confident Ordinary Differential Editing (CODE) is a novel approach for image synthesis that effectively handles Out-of-Distribution (OoD) guidance images.
CODE enhances images through score-based updates along the probability-flow Ordinary Differential Equation (ODE) trajectory.
Our method operates in a fully blind manner, relying solely on a pre-trained generative model.
arXiv Detail & Related papers (2024-08-22T14:12:20Z) - MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation [59.00909718832648]
We propose MC$2$, a novel approach for multi-concept customization.<n>By adaptively refining attention weights between visual and textual tokens, our method ensures that image regions accurately correspond to their associated concepts.<n>Experiments demonstrate that MC$2$ outperforms training-based methods in terms of prompt-reference alignment.
arXiv Detail & Related papers (2024-04-08T07:59:04Z) - Direct Consistency Optimization for Robust Customization of Text-to-Image Diffusion Models [67.68871360210208]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, can generate visuals with a high degree of consistency.
We propose a novel fine-tuning objective, dubbed Direct Consistency Optimization, which controls the deviation between fine-tuning and pretrained models.
We show that our approach achieves better prompt fidelity and subject fidelity than those post-optimized for merging regular fine-tuned models.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - Orthogonal Adaptation for Modular Customization of Diffusion Models [39.62438974450659]
We address a new problem called Modular Customization, with the goal of efficiently merging customized models.<n>We introduce Orthogonal Adaptation, a method designed to encourage the customized models, which do not have access to each other during fine-tuning.<n>Our proposed method is both simple and versatile, applicable to nearly all optimizable weights in the model architecture.
arXiv Detail & Related papers (2023-12-05T02:17:48Z) - Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image
Models [59.094601993993535]
Text-to-image (T2I) personalization allows users to combine their own visual concepts in natural language prompts.
Most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts.
We propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts.
arXiv Detail & Related papers (2023-07-13T17:46:42Z) - ReGeneration Learning of Diffusion Models with Rich Prompts for
Zero-Shot Image Translation [8.803251014279502]
Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images.
Current models can impose significant changes to the original image content during the editing process.
We propose ReGeneration learning in an image-to-image Diffusion model (ReDiffuser)
arXiv Detail & Related papers (2023-05-08T12:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.