MultiAnimate: Pose-Guided Image Animation Made Extensible
- URL: http://arxiv.org/abs/2602.21581v1
- Date: Wed, 25 Feb 2026 05:06:58 GMT
- Title: MultiAnimate: Pose-Guided Image Animation Made Extensible
- Authors: Yingcheng Hu, Haowen Gong, Chuanguang Yang, Zhulin An, Yongjun Xu, Songhua Liu,
- Abstract summary: A Pose-guided human image animation aims to synthesize realistic videos of a reference character driven by a sequence of poses.<n>We propose a multi-character image animation framework built upon modern Diffusion Transformers for video generation.<n>We show that our framework achieves state-of-the-art performance in multi-character image animation, surpassing existing diffusion-based baselines.
- Score: 44.163219649465866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pose-guided human image animation aims to synthesize realistic videos of a reference character driven by a sequence of poses. While diffusion-based methods have achieved remarkable success, most existing approaches are limited to single-character animation. We observe that naively extending these methods to multi-character scenarios often leads to identity confusion and implausible occlusions between characters. To address these challenges, in this paper, we propose an extensible multi-character image animation framework built upon modern Diffusion Transformers (DiTs) for video generation. At its core, our framework introduces two novel components-Identifier Assigner and Identifier Adapter - which collaboratively capture per-person positional cues and inter-person spatial relationships. This mask-driven scheme, along with a scalable training strategy, not only enhances flexibility but also enables generalization to scenarios with more characters than those seen during training. Remarkably, trained on only a two-character dataset, our model generalizes to multi-character animation while maintaining compatibility with single-character cases. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in multi-character image animation, surpassing existing diffusion-based baselines.
Related papers
- One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer [36.26551019954542]
We present One-to-All Animation, a framework for high-fidelity character animation and image pose transfer.<n>To handle spatially misaligned reference, we reformulate training as a self-supervised outpainting task.<n>We also design a reference extractor for comprehensive identity feature extraction.
arXiv Detail & Related papers (2025-11-28T07:30:10Z) - Wan-Animate: Unified Character Animation and Replacement with Holistic Replication [53.619006977292635]
We introduce Wan-Animate, a unified framework for character animation and replacement.<n>It can animate the character by precisely replicating the expressions and movements of the character in the video to generate high-fidelity character videos.<n>It can integrate the animated character into the reference video to replace the original character, replicating the scene's lighting and color tone.
arXiv Detail & Related papers (2025-09-17T15:00:57Z) - Animate-X++: Universal Character Image Animation with Dynamic Backgrounds [32.04255747303296]
Animate-X++ is a universal animation framework based on DiT for various character types, including anthropomorphic characters.<n>To enhance motion representation, we introduce the Pose Indicator, which captures comprehensive motion pattern from the driving video through both implicit and explicit manner.<n>For the second challenge, we introduce a multi-task training strategy that jointly trains the animation and TI2V tasks.
arXiv Detail & Related papers (2025-08-13T03:11:28Z) - FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers [10.4806619052953]
We propose FantasyPortrait, a diffusion transformer based framework capable of generating high-fidelity and emotion-rich animations.<n>Our method introduces an expression-augmented learning strategy that utilizes implicit representations to capture identity-agnostic facial dynamics.<n>For multi-character control, we design a masked cross-attention mechanism that ensures independent yet coordinated expression generation.
arXiv Detail & Related papers (2025-07-17T09:50:43Z) - Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling [77.08568533331206]
We propose a novel multi-condition guided framework for character image animation.<n>We employ several well-designed input modules to enhance the implicit decoupling capability of the model.<n>Our method excels in generating high-quality character animations, especially in scenarios of complex backgrounds and multiple characters.
arXiv Detail & Related papers (2024-06-05T08:03:18Z) - VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability.
An identity-aware appearance controller integrates additional facial information without compromising other appearance details.
A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps.
VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z) - Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image.
Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details.
We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z) - AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment [64.02822911038848]
We present AnimateZoo, a zero-shot diffusion-based video generator to produce animal animations.
Key technique used in our AnimateZoo is subject alignment, which includes two steps.
Our model is capable of generating videos characterized by accurate movements, consistent appearance, and high-fidelity frames.
arXiv Detail & Related papers (2024-04-07T12:57:41Z) - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities.
In this paper, we propose a novel framework tailored for character animation.
By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.