Character Mixing for Video Generation
- URL: http://arxiv.org/abs/2510.05093v1
- Date: Mon, 06 Oct 2025 17:57:39 GMT
- Title: Character Mixing for Video Generation
- Authors: Tingting Liao, Chongjian Ge, Guangyi Liu, Hao Li, Yi Zhou,
- Abstract summary: We study inter-character interaction in text-to-video generation.<n>Key challenge is to preserve each character's identity and behaviors while enabling coherent cross-context interaction.<n>We introduce a framework that tackles these issues with Cross-Character Embedding and Cross-Character Augmentation.
- Score: 15.285132540147304
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Imagine Mr. Bean stepping into Tom and Jerry--can we generate videos where characters interact naturally across different worlds? We study inter-character interaction in text-to-video generation, where the key challenge is to preserve each character's identity and behaviors while enabling coherent cross-context interaction. This is difficult because characters may never have coexisted and because mixing styles often causes style delusion, where realistic characters appear cartoonish or vice versa. We introduce a framework that tackles these issues with Cross-Character Embedding (CCE), which learns identity and behavioral logic across multimodal sources, and Cross-Character Augmentation (CCA), which enriches training with synthetic co-existence and mixed-style data. Together, these techniques allow natural interactions between previously uncoexistent characters without losing stylistic fidelity. Experiments on a curated benchmark of cartoons and live-action series with 10 characters show clear improvements in identity preservation, interaction quality, and robustness to style delusion, enabling new forms of generative storytelling.Additional results and videos are available on our project page: https://tingtingliao.github.io/mimix/.
Related papers
- IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation [75.09818147405898]
IdentityStory is a framework for human-centric story generation that ensures consistent character identity across sequential images.<n>By taming identity-preserving generators, the framework features two key components: Iterative Identity Discovery and Re-denoising Identity Injection.
arXiv Detail & Related papers (2025-12-29T14:54:44Z) - Character-Centric Understanding of Animated Movies [88.83104906869106]
We propose an audio-visual pipeline to enable automatic and robust animated character recognition.<n>Characters in animated movies exhibit extreme diversity in their appearance, motion, and deformation.<n>This pipeline enhances character-centric understanding of animated movies.
arXiv Detail & Related papers (2025-09-15T17:59:51Z) - Constella: Supporting Storywriters' Interconnected Character Creation through LLM-based Multi-Agents [7.537475180985097]
Constella is a multi-agent tool that supports storywriters' interconnected character creation process.<n>Our 7-8 day deployment study with storywriters shows that Constella enabled the creation of expansive communities composed of related characters.<n>We conclude by discussing how multi-agent interactions can help distribute writers' attention and effort across the character cast.
arXiv Detail & Related papers (2025-07-08T09:39:02Z) - MoCha: Towards Movie-Grade Talking Character Synthesis [62.007000023747445]
We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text.<n>Unlike talking head, Talking Characters aims at generating the full portrait of one or more characters beyond the facial region.<n>We propose MoCha, the first of its kind to generate talking characters.
arXiv Detail & Related papers (2025-03-30T04:22:09Z) - Motion by Queries: Identity-Motion Trade-offs in Text-to-Video Generation [47.61288672890036]
We investigate how self-attention query features govern motion, structure, and identity in text-to-video models.<n>We demonstrate two applications: a zero-shot motion transfer method and a training-free technique for consistent multi-shot video generation.
arXiv Detail & Related papers (2024-12-10T18:49:39Z) - StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation [10.652011707000202]
We introduce StoryMaker, a personalization solution that preserves not only facial consistency but also clothing, hairstyles, and body consistency.
StoryMaker supports numerous applications and is compatible with other societal plug-ins.
arXiv Detail & Related papers (2024-09-19T08:53:06Z) - MAAIP: Multi-Agent Adversarial Interaction Priors for imitation from
fighting demonstrations for physics-based characters [5.303375034962503]
We propose a novel Multi-Agent Generative Adversarial Imitation Learning based approach.
Our system trains control policies allowing each character to imitate the interactive skills associated with each actor.
This approach has been tested on two different fighting styles, boxing and full-body martial art, to demonstrate the ability of the method to imitate different styles.
arXiv Detail & Related papers (2023-11-04T20:40:39Z) - Synthesizing Physical Character-Scene Interactions [64.26035523518846]
It is necessary to synthesize such interactions between virtual characters and their surroundings.
We present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters.
Our approach takes physics-based character motion generation a step closer to broad applicability.
arXiv Detail & Related papers (2023-02-02T05:21:32Z) - Make-A-Story: Visual Memory Conditioned Consistent Story Generation [57.691064030235985]
We propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context.
Our method outperforms prior state-of-the-art in generating frames with high visual quality.
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, but also models appropriate correspondences between the characters and the background.
arXiv Detail & Related papers (2022-11-23T21:38:51Z) - Triangular Character Animation Sampling with Motion, Emotion, and
Relation [78.80083186208712]
We present a novel framework to sample and synthesize animations by associating the characters' body motions, facial expressions, and social relations.
Our method can provide animators with an automatic way to generate 3D character animations, help synthesize interactions between Non-Player Characters (NPCs) and enhance machine emotion intelligence in virtual reality (VR)
arXiv Detail & Related papers (2022-03-09T18:19:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.