Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
- URL: http://arxiv.org/abs/2502.07802v1
- Date: Tue, 04 Feb 2025 22:03:26 GMT
- Title: Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
- Authors: Feng Liang, Haoyu Ma, Zecheng He, Tingbo Hou, Ji Hou, Kunpeng Li, Xiaoliang Dai, Felix Juefei-Xu, Samaneh Azadi, Animesh Sinha, Peizhao Zhang, Peter Vajda, Diana Marculescu,
- Abstract summary: We propose a new method for video personalization based on multi-concept integration.
Movie Weaver seamlessly weaves multiple concepts-including face, body, and animal images-into one video, allowing flexible combinations in a single model.
The evaluation shows that Movie Weaver outperforms existing methods for multi-concept video personalization in identity preservation and overall quality.
- Score: 49.63959518905243
- License:
- Abstract: Video personalization, which generates customized videos using reference images, has gained significant attention. However, prior methods typically focus on single-concept personalization, limiting broader applications that require multi-concept integration. Attempts to extend these models to multiple concepts often lead to identity blending, which results in composite characters with fused attributes from multiple sources. This challenge arises due to the lack of a mechanism to link each concept with its specific reference image. We address this with anchored prompts, which embed image anchors as unique tokens within text prompts, guiding accurate referencing during generation. Additionally, we introduce concept embeddings to encode the order of reference images. Our approach, Movie Weaver, seamlessly weaves multiple concepts-including face, body, and animal images-into one video, allowing flexible combinations in a single model. The evaluation shows that Movie Weaver outperforms existing methods for multi-concept video personalization in identity preservation and overall quality.
Related papers
- Multi-subject Open-set Personalization in Video Generation [110.02124633005516]
We present Video Alchemist $-$ a video model with built-in multi-subject, open-set personalization capabilities.
Our model is built on a new Diffusion Transformer module that fuses each conditional reference image and its corresponding subject-level text prompt.
Our method significantly outperforms existing personalization methods in both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2025-01-10T18:59:54Z) - ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning [40.70596166863986]
Multi-Concept Video Customization (MCVC) remains a significant challenge.
We introduce ConceptMaster, an innovative framework that effectively tackles the issues of identity decoupling while maintaining concept fidelity in customized videos.
Specifically, we introduce a novel strategy of learning decoupled multi-concept embeddings that are injected into the diffusion models in a standalone manner.
arXiv Detail & Related papers (2025-01-08T18:59:01Z) - TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation [67.97044071594257]
TweedieMix is a novel method for composing customized diffusion models.
Our framework can be effortlessly extended to image-to-video diffusion models.
arXiv Detail & Related papers (2024-10-08T01:06:01Z) - Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis [14.21719970175159]
Concept Conductor is designed to ensure visual fidelity and correct layout in multi-concept customization.
We present a concept injection technique that employs shape-aware masks to specify the generation area for each concept.
Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts.
arXiv Detail & Related papers (2024-08-07T08:43:58Z) - FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition [49.2208591663092]
FreeCustom is a tuning-free method to generate customized images of multi-concept composition based on reference concepts.
We introduce a new multi-reference self-attention (MRSA) mechanism and a weighted mask strategy.
Our method outperforms or performs on par with other training-based methods in terms of multi-concept composition and single-concept customization.
arXiv Detail & Related papers (2024-05-22T17:53:38Z) - Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models [85.14042557052352]
We introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time.
We show that Concept Weaver can generate multiple custom concepts with higher identity fidelity compared to alternative approaches.
arXiv Detail & Related papers (2024-04-05T06:41:27Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.