MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
- URL: http://arxiv.org/abs/2503.12689v1
- Date: Sun, 16 Mar 2025 23:15:09 GMT
- Title: MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
- Authors: Hengjia Li, Lifan Jiang, Xi Xiao, Tianyang Wang, Hongwei Yi, Boxi Wu, Deng Cai,
- Abstract summary: Video identity customization seeks to produce high-fidelity videos that maintain consistent identity and exhibit significant dynamics based on users' reference images.<n>We introduce MagicID, a novel framework designed to promote the generation of identity-consistent and dynamically rich videos tailored to user preferences.<n>Experiments show that MagicID successfully achieves consistent identity and natural dynamics, surpassing existing methods across various metrics.
- Score: 24.398759596367103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video identity customization seeks to produce high-fidelity videos that maintain consistent identity and exhibit significant dynamics based on users' reference images. However, existing approaches face two key challenges: identity degradation over extended video length and reduced dynamics during training, primarily due to their reliance on traditional self-reconstruction training with static images. To address these issues, we introduce $\textbf{MagicID}$, a novel framework designed to directly promote the generation of identity-consistent and dynamically rich videos tailored to user preferences. Specifically, we propose constructing pairwise preference video data with explicit identity and dynamic rewards for preference learning, instead of sticking to the traditional self-reconstruction. To address the constraints of customized preference data, we introduce a hybrid sampling strategy. This approach first prioritizes identity preservation by leveraging static videos derived from reference images, then enhances dynamic motion quality in the generated videos using a Frontier-based sampling method. By utilizing these hybrid preference pairs, we optimize the model to align with the reward differences between pairs of customized preferences. Extensive experiments show that MagicID successfully achieves consistent identity and natural dynamics, surpassing existing methods across various metrics.
Related papers
- Subject-driven Video Generation via Disentangled Identity and Motion [52.54835936914813]
We propose to train a subject-driven customized video generation model through decoupling the subject-specific learning from temporal dynamics in zero-shot without additional tuning.
Our method achieves strong subject consistency and scalability, outperforming existing video customization models in zero-shot settings.
arXiv Detail & Related papers (2025-04-23T06:48:31Z) - Dynamic Concepts Personalization from Single Videos [92.62863918003575]
We introduce Set-and-Sequence, a novel framework for personalizing generative video models with dynamic concepts.<n>Our approach imposes a-temporal weight space within an architecture that does not explicitly separate spatial and temporal features.<n>Our framework embeds dynamic concepts into the video model's output domain, enabling unprecedented editability and compositionality.
arXiv Detail & Related papers (2025-02-20T18:53:39Z) - Multi-subject Open-set Personalization in Video Generation [110.02124633005516]
We present Video Alchemist $-$ a video model with built-in multi-subject, open-set personalization capabilities.<n>Our model is built on a new Diffusion Transformer module that fuses each conditional reference image and its corresponding subject-level text prompt.<n>Our method significantly outperforms existing personalization methods in both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2025-01-10T18:59:54Z) - ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning [40.70596166863986]
Multi-Concept Video Customization (MCVC) remains a significant challenge.<n>We introduce ConceptMaster, an innovative framework that effectively tackles the issues of identity decoupling while maintaining concept fidelity in customized videos.<n>Specifically, we introduce a novel strategy of learning decoupled multi-concept embeddings that are injected into the diffusion models in a standalone manner.
arXiv Detail & Related papers (2025-01-08T18:59:01Z) - VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping [43.30061680192465]
We present the first diffusion-based framework specifically designed for video face swapping.<n>Our approach incorporates a specially designed diffusion model coupled with a VidFaceVAE.<n>Our framework achieves superior performance in identity preservation, temporal consistency, and visual quality compared to existing methods.
arXiv Detail & Related papers (2024-12-15T18:58:32Z) - PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation [36.21554597804604]
Identity-specific human video generation with customized ID images is still under-explored.<n>Key challenge lies in maintaining high ID fidelity consistently while preserving the original motion dynamic and semantic following.<n>We propose a novel framework, dubbed $textbfPersonalVideo$, that applies a mixture of reward supervision on synthesized videos.
arXiv Detail & Related papers (2024-11-26T02:25:38Z) - TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation [67.97044071594257]
TweedieMix is a novel method for composing customized diffusion models.<n>Our framework can be effortlessly extended to image-to-video diffusion models.
arXiv Detail & Related papers (2024-10-08T01:06:01Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization.
We learn an identity encoder which can extract an identity representation from a set of reference images of a subject.
We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.