Related papers: Plug-and-Play Multi-Concept Adaptive Blending for High-Fidelity Text-to-Image Synthesis

Plug-and-Play Multi-Concept Adaptive Blending for High-Fidelity Text-to-Image Synthesis

URL: http://arxiv.org/abs/2511.17615v1
Date: Tue, 18 Nov 2025 12:25:47 GMT
Title: Plug-and-Play Multi-Concept Adaptive Blending for High-Fidelity Text-to-Image Synthesis
Authors: Young-Beom Woo,
Abstract summary: We introduce plug-and-play multi-concept blending for high-fidelity text-to-image (T2I) generation.<n>Our method leverages guided appearance attention to faithfully reflect the intended appearance of each personalized concept.<n>We also present a mask-guided noise mixing strategy that preserves the integrity of non-personalized regions.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Integrating multiple personalized concepts into a single image has recently become a significant area of focus within Text-to-Image (T2I) generation. However, existing methods often underperform on complex multi-object scenes due to unintended alterations in both personalized and non-personalized regions. This not only fails to preserve the intended prompt structure but also disrupts interactions among regions, leading to semantic inconsistencies. To address this limitation, we introduce plug-and-play multi-concept adaptive blending for high-fidelity text-to-image synthesis (PnP-MIX), an innovative, tuning-free approach designed to seamlessly embed multiple personalized concepts into a single generated image. Our method leverages guided appearance attention to faithfully reflect the intended appearance of each personalized concept. To further enhance compositional fidelity, we present a mask-guided noise mixing strategy that preserves the integrity of non-personalized regions such as the background or unrelated objects while enabling the precise integration of personalized objects. Finally, to mitigate concept leakage, i.e., the inadvertent leakage of personalized concept features into other regions, we propose background dilution++, a novel strategy that effectively reduces such leakage and promotes accurate localization of features within personalized regions. Extensive experimental results demonstrate that PnP-MIX consistently surpasses existing methodologies in both single- and multi-concept personalization scenarios, underscoring its robustness and superior performance without additional model tuning.

Related papers

Unified Personalized Understanding, Generating and Editing [54.5563878110386]
We present textbf OmniPersona, an end-to-end personalization framework for unified LMMs.<n>It integrates personalized understanding, generation, and image editing within a single architecture.<n>Experiments demonstrate that OmniPersona delivers competitive and robust performance across diverse personalization tasks.
arXiv Detail & Related papers (2026-01-11T15:46:34Z)
FocusDPO: Dynamic Preference Optimization for Multi-Subject Personalized Image Generation via Adaptive Focus [10.615833390806486]
Multi-subject personalized image generation aims to synthesize customized images containing multiple specified subjects without requiring test-time optimization.<n>We present FocusDPO, a framework that adaptively identifies focus regions based on dynamic semantic correspondence and supervision image complexity.
arXiv Detail & Related papers (2025-09-01T07:06:36Z)
Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation [73.16975077770765]
Modular customization is essential for applications like concept stylization and multi-concept customization.<n>Instant merging methods often cause identity loss and interference of individual merged concepts.<n>We propose BlockLoRA, an instant merging method designed to efficiently combine multiple concepts while accurately preserving individual concepts' identity.
arXiv Detail & Related papers (2025-03-11T16:10:36Z)
FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation [26.585985828583304]
We propose FlipConcept, a novel approach that seamlessly integrates multiple personalized concepts into a single image.<n>We introduce guided appearance attention, mask-guided noise mixing, and background dilution to minimize concept leakage.<n>Despite not requiring tuning, our method outperforms existing models in both single and multiple personalized concept inference.
arXiv Detail & Related papers (2025-02-21T04:37:18Z)
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models [84.04930416829264]
LoRACLR is a novel approach for multi-concept image generation that merges multiple LoRA models, each fine-tuned for a distinct concept, into a single, unified model.<n>LoRACLR uses a contrastive objective to align and merge the weight spaces of these models, ensuring compatibility while minimizing interference.<n>Our results highlight the effectiveness of LoRACLR in accurately merging multiple concepts, advancing the capabilities of personalized image generation.
arXiv Detail & Related papers (2024-12-12T18:59:55Z)
MagicFace: Training-free Universal-Style Human Image Customized Synthesis [13.944050414488911]
MagicFace is a training-free method for multi-concept universal-style human image personalized synthesis. Our core idea is to simulate how humans create images given specific concepts, first establish a semantic layout. In the first stage, RSA enables the latent image to query features from all reference concepts simultaneously.
arXiv Detail & Related papers (2024-08-14T10:08:46Z)
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior [50.0535198082903]
We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image. We showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish generic image composition.
arXiv Detail & Related papers (2024-07-06T03:35:43Z)
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models [47.63060402915307]
OMG is a framework designed to seamlessly integrate multiple concepts within a single image. OMG exhibits superior performance in multi-concept personalization. LoRA models on civitai.com can be exploited directly.
arXiv Detail & Related papers (2024-03-16T17:30:15Z)
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models [85.69959024572363]
CustomNet is a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process. We introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images. Our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background.
arXiv Detail & Related papers (2023-10-30T17:50:14Z)
Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition. We propose augmenting the input image with masks that indicate the presence of target concepts. We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.