Related papers: MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation

MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation

URL: http://arxiv.org/abs/2404.05268v2
Date: Fri, 12 Apr 2024 06:20:49 GMT
Title: MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation
Authors: Jiaxiu Jiang, Yabo Zhang, Kailai Feng, Xiaohe Wu, Wangmeng Zuo,
Abstract summary: We introduce the Multi-concept guidance for Multi-concept customization, termed MC$2$, for improved flexibility and fidelity. MC$2$ decouples the requirements for model architecture via inference time optimization. It adaptively refines the attention weights between visual and textual tokens, directing image regions to focus on their associated words.
Score: 49.935634230341904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Customized text-to-image generation aims to synthesize instantiations of user-specified concepts and has achieved unprecedented progress in handling individual concept. However, when extending to multiple customized concepts, existing methods exhibit limitations in terms of flexibility and fidelity, only accommodating the combination of limited types of models and potentially resulting in a mix of characteristics from different concepts. In this paper, we introduce the Multi-concept guidance for Multi-concept customization, termed MC$^2$, for improved flexibility and fidelity. MC$^2$ decouples the requirements for model architecture via inference time optimization, allowing the integration of various heterogeneous single-concept customized models. It adaptively refines the attention weights between visual and textual tokens, directing image regions to focus on their associated words while diminishing the impact of irrelevant ones. Extensive experiments demonstrate that MC$^2$ even surpasses previous methods that require additional training in terms of consistency with input prompt and reference images. Moreover, MC$^2$ can be extended to elevate the compositional capabilities of text-to-image generation, yielding appealing results. Code will be publicly available at https://github.com/JIANGJiaXiu/MC-2.

Related papers

Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter [52.08332620725473]
We propose a tuning-free method for multi-concept personalization that can effectively customize both object and abstract concepts without test-time fine-tuning.<n>Our method achieves state-of-the-art performance in multi-concept personalization, supported by quantitative, qualitative, and human evaluations.
arXiv Detail & Related papers (2025-05-24T09:21:32Z)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model [51.645660375766575]
This paper proposes the first multi-concept personalization paradigm, MC-LLaVA. MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step. Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses.
arXiv Detail & Related papers (2025-03-24T16:32:17Z)
FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation [26.585985828583304]
We propose FlipConcept, a novel approach that seamlessly integrates multiple personalized concepts into a single image.<n>We introduce guided appearance attention, mask-guided noise mixing, and background dilution to minimize concept leakage.<n>Despite not requiring tuning, our method outperforms existing models in both single and multiple personalized concept inference.
arXiv Detail & Related papers (2025-02-21T04:37:18Z)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model [51.645660375766575]
This paper proposes the first multi-concept personalization paradigm, MC-LLaVA. MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step. Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses.
arXiv Detail & Related papers (2024-11-18T16:33:52Z)
ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty [52.15933752463479]
ConceptMix is a scalable, controllable, and customizable benchmark. It automatically evaluates compositional generation ability of Text-to-Image (T2I) models. It reveals that the performance of several models, especially open models, drops dramatically with increased k.
arXiv Detail & Related papers (2024-08-26T15:08:12Z)
Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis [14.21719970175159]
Concept Conductor is designed to ensure visual fidelity and correct layout in multi-concept customization. We present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts.
arXiv Detail & Related papers (2024-08-07T08:43:58Z)
AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization [4.544788024283586]
AttenCraft is an attention-guided method for multiple concept disentanglement. We introduce Uniform sampling and Reweighted sampling schemes to alleviate the non-synchronicity of feature acquisition from different concepts. Our method outperforms baseline models in terms of image-alignment, and behaves comparably on text-alignment.
arXiv Detail & Related papers (2024-05-28T08:50:14Z)
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition [49.2208591663092]
FreeCustom is a tuning-free method to generate customized images of multi-concept composition based on reference concepts. We introduce a new multi-reference self-attention (MRSA) mechanism and a weighted mask strategy. Our method outperforms or performs on par with other training-based methods in terms of multi-concept composition and single-concept customization.
arXiv Detail & Related papers (2024-05-22T17:53:38Z)
MultiBooth: Towards Generating All Your Concepts in an Image from Text [29.02126551676985]
This paper introduces MultiBooth, a novel and efficient technique for multi-concept customization in image generation from text. In the single-concept learning phase, we employ a multi-modal image encoder and an efficient concept encoding technique to learn a concise and discriminative representation for each concept. In the multi-concept integration phase, we use bounding boxes to define the generation area for each concept within the cross-attention map.
arXiv Detail & Related papers (2024-04-22T14:47:54Z)
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition [47.07564907486087]
Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models.
arXiv Detail & Related papers (2024-02-23T18:55:09Z)
Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes. Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts. However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive. We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z)
Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else [75.6806649860538]
We consider a more ambitious goal: natural multi-concept generation using a pre-trained diffusion model. We observe concept dominance and non-localized contribution that severely degrade multi-concept generation performance. We design a minimal low-cost solution that overcomes the above issues by tweaking the text embeddings for more realistic multi-concept text-to-image generation.
arXiv Detail & Related papers (2023-10-11T12:05:44Z)
Multi-Concept Customization of Text-to-Image Diffusion [51.8642043743222]
We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts. Our model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings.
arXiv Detail & Related papers (2022-12-08T18:57:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.