Related papers: ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning

ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning

URL: http://arxiv.org/abs/2501.04698v1
Date: Wed, 08 Jan 2025 18:59:01 GMT
Title: ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
Authors: Yuzhou Huang, Ziyang Yuan, Quande Liu, Qiulin Wang, Xintao Wang, Ruimao Zhang, Pengfei Wan, Di Zhang, Kun Gai,
Abstract summary: Multi-Concept Video Customization (MCVC) remains a significant challenge.<n>We introduce ConceptMaster, an innovative framework that effectively tackles the issues of identity decoupling while maintaining concept fidelity in customized videos.<n>Specifically, we introduce a novel strategy of learning decoupled multi-concept embeddings that are injected into the diffusion models in a standalone manner.
Score: 40.70596166863986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-video generation has made remarkable advancements through diffusion models. However, Multi-Concept Video Customization (MCVC) remains a significant challenge. We identify two key challenges in this task: 1) the identity decoupling problem, where directly adopting existing customization methods inevitably mix attributes when handling multiple concepts simultaneously, and 2) the scarcity of high-quality video-entity pairs, which is crucial for training such a model that represents and decouples various concepts well. To address these challenges, we introduce ConceptMaster, an innovative framework that effectively tackles the critical issues of identity decoupling while maintaining concept fidelity in customized videos. Specifically, we introduce a novel strategy of learning decoupled multi-concept embeddings that are injected into the diffusion models in a standalone manner, which effectively guarantees the quality of customized videos with multiple identities, even for highly similar visual concepts. To further overcome the scarcity of high-quality MCVC data, we carefully establish a data construction pipeline, which enables systematic collection of precise multi-concept video-entity data across diverse concepts. A comprehensive benchmark is designed to validate the effectiveness of our model from three critical dimensions: concept fidelity, identity decoupling ability, and video generation quality across six different concept composition scenarios. Extensive experiments demonstrate that our ConceptMaster significantly outperforms previous approaches for this task, paving the way for generating personalized and semantically accurate videos across multiple concepts.

Related papers

MC-LLaVA: Multi-Concept Personalized Vision-Language Model [51.645660375766575]
This paper proposes the first multi-concept personalization paradigm, MC-LLaVA. MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step. Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses.
arXiv Detail & Related papers (2025-03-24T16:32:17Z)
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts [49.63959518905243]
We propose a new method for video personalization based on multi-concept integration. Movie Weaver seamlessly weaves multiple concepts-including face, body, and animal images-into one video, allowing flexible combinations in a single model. The evaluation shows that Movie Weaver outperforms existing methods for multi-concept video personalization in identity preservation and overall quality.
arXiv Detail & Related papers (2025-02-04T22:03:26Z)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model [51.645660375766575]
This paper proposes the first multi-concept personalization paradigm, MC-LLaVA. MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step. Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses.
arXiv Detail & Related papers (2024-11-18T16:33:52Z)
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation [67.97044071594257]
TweedieMix is a novel method for composing customized diffusion models. Our framework can be effortlessly extended to image-to-video diffusion models.
arXiv Detail & Related papers (2024-10-08T01:06:01Z)
Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis [14.21719970175159]
Concept Conductor is designed to ensure visual fidelity and correct layout in multi-concept customization. We present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts.
arXiv Detail & Related papers (2024-08-07T08:43:58Z)
MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation [59.00909718832648]
We propose MC$2$, a novel approach for multi-concept customization.<n>By adaptively refining attention weights between visual and textual tokens, our method ensures that image regions accurately correspond to their associated concepts.<n>Experiments demonstrate that MC$2$ outperforms training-based methods in terms of prompt-reference alignment.
arXiv Detail & Related papers (2024-04-08T07:59:04Z)
Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes. Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts. However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive. We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.