OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
- URL: http://arxiv.org/abs/2403.10983v2
- Date: Sat, 20 Jul 2024 15:56:18 GMT
- Title: OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
- Authors: Zhe Kong, Yong Zhang, Tianyu Yang, Tao Wang, Kaihao Zhang, Bizhu Wu, Guanying Chen, Wei Liu, Wenhan Luo,
- Abstract summary: OMG is a framework designed to seamlessly integrate multiple concepts within a single image.
OMG exhibits superior performance in multi-concept personalization.
LoRA models on civitai.com can be exploited directly.
- Score: 47.63060402915307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalization is an important topic in text-to-image generation, especially the challenging multi-concept personalization. Current multi-concept methods are struggling with identity preservation, occlusion, and the harmony between foreground and background. In this work, we propose OMG, an occlusion-friendly personalized generation framework designed to seamlessly integrate multiple concepts within a single image. We propose a novel two-stage sampling solution. The first stage takes charge of layout generation and visual comprehension information collection for handling occlusions. The second one utilizes the acquired visual comprehension information and the designed noise blending to integrate multiple concepts while considering occlusions. We also observe that the initiation denoising timestep for noise blending is the key to identity preservation and layout. Moreover, our method can be combined with various single-concept models, such as LoRA and InstantID without additional tuning. Especially, LoRA models on civitai.com can be exploited directly. Extensive experiments demonstrate that OMG exhibits superior performance in multi-concept personalization.
Related papers
- TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation [67.97044071594257]
TweedieMix is a novel method for composing customized diffusion models.
Our framework can be effortlessly extended to image-to-video diffusion models.
arXiv Detail & Related papers (2024-10-08T01:06:01Z) - AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization [4.544788024283586]
AttenCraft is an attention-guided method for multiple concept disentanglement.
We introduce Uniform sampling and Reweighted sampling schemes to alleviate the non-synchronicity of feature acquisition from different concepts.
Our method outperforms baseline models in terms of image-alignment, and behaves comparably on text-alignment.
arXiv Detail & Related papers (2024-05-28T08:50:14Z) - FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition [49.2208591663092]
FreeCustom is a tuning-free method to generate customized images of multi-concept composition based on reference concepts.
We introduce a new multi-reference self-attention (MRSA) mechanism and a weighted mask strategy.
Our method outperforms or performs on par with other training-based methods in terms of multi-concept composition and single-concept customization.
arXiv Detail & Related papers (2024-05-22T17:53:38Z) - MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation [49.935634230341904]
We introduce the Multi-concept guidance for Multi-concept customization, termed MC$2$, for improved flexibility and fidelity.
MC$2$ decouples the requirements for model architecture via inference time optimization.
It adaptively refines the attention weights between visual and textual tokens, directing image regions to focus on their associated words.
arXiv Detail & Related papers (2024-04-08T07:59:04Z) - Attention Calibration for Disentangled Text-to-Image Personalization [12.339742346826403]
We propose an attention calibration mechanism to improve the concept-level understanding of the T2I model.
We demonstrate that our method outperforms the current state of the art in both qualitative and quantitative evaluations.
arXiv Detail & Related papers (2024-03-27T13:31:39Z) - LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [33.379758040084894]
Multi-concept customization emerges as the challenging task within this domain.
Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image.
LoRA-Composer is a training-free framework designed for seamlessly integrating multiple LoRAs.
arXiv Detail & Related papers (2024-03-18T09:58:52Z) - Multi-view Information Integration and Propagation for Occluded Person
Re-identification [36.91680117072686]
Occluded person re-identification (re-ID) presents a challenging task due to occlusion perturbations.
Most current solutions only capture information from a single image, disregarding the rich complementary information available in multiple images depicting the same pedestrian.
We propose a novel framework called Multi-view Information Integration and Propagation (MVI$2$P)
arXiv Detail & Related papers (2023-11-07T09:17:56Z) - Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept
Customization of Diffusion Models [72.67967883658957]
Public large-scale text-to-image diffusion models can be easily customized for new concepts using low-rank adaptations (LoRAs)
The utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge.
We propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization.
arXiv Detail & Related papers (2023-05-29T17:58:16Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.