Animate Any Character in Any World
- URL: http://arxiv.org/abs/2512.17796v1
- Date: Thu, 18 Dec 2025 18:59:18 GMT
- Title: Animate Any Character in Any World
- Authors: Yitong Wang, Fangyun Wei, Hongyang Zhang, Bo Dai, Yan Lu,
- Abstract summary: We introduce AniX, leveraging the realism and structural grounding of static world generation.<n>Users can provide a 3DGS scene and a character, then direct the character through natural language to perform diverse behaviors.<n>AiX synthesizes temporally coherent video clips that preserve visual fidelity with the provided scene and character.
- Score: 61.112404900403284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in world models have greatly enhanced interactive environment simulation. Existing methods mainly fall into two categories: (1) static world generation models, which construct 3D environments without active agents, and (2) controllable-entity models, which allow a single entity to perform limited actions in an otherwise uncontrollable environment. In this work, we introduce AniX, leveraging the realism and structural grounding of static world generation while extending controllable-entity models to support user-specified characters capable of performing open-ended actions. Users can provide a 3DGS scene and a character, then direct the character through natural language to perform diverse behaviors from basic locomotion to object-centric interactions while freely exploring the environment. AniX synthesizes temporally coherent video clips that preserve visual fidelity with the provided scene and character, formulated as a conditional autoregressive video generation problem. Built upon a pre-trained video generator, our training strategy significantly enhances motion dynamics while maintaining generalization across actions and characters. Our evaluation covers a broad range of aspects, including visual quality, character consistency, action controllability, and long-horizon coherence.
Related papers
- Walk through Paintings: Egocentric World Models from Internet Priors [65.30611174953958]
We present the Egocentric World Model (EgoWM), a simple, architecture-agnostic method that transforms any pretrained video diffusion model into an action-conditioned world model.<n>Rather than training from scratch, we repurpose the rich world priors of Internet-scale video models and inject motor commands through lightweight conditioning layers.<n>Our approach scales naturally across embodiments and action spaces, ranging from 3-DoF mobile robots to 25-DoF humanoids.
arXiv Detail & Related papers (2026-01-21T18:59:32Z) - Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model [19.937724706042804]
Hunyuan-GameCraft-2 is a new paradigm of instruction-driven interaction for generative game world modeling.<n>Our model allows users to control game video contents through natural language prompts, keyboard, or mouse signals.<n>Our model generates temporally coherent and causally grounded interactive game videos.
arXiv Detail & Related papers (2025-11-28T18:26:39Z) - MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling [107.8379802891245]
We propose MoSA, which decouples the process of human video generation into two components, i.e. structure generation and appearance generation.<n>MoSA substantially outperforms existing approaches across the majority of evaluation metrics.<n>This paper also contributes a large-scale human video dataset, which features more complex and diverse motions than existing human video datasets.
arXiv Detail & Related papers (2025-08-24T15:20:24Z) - PlayerOne: Egocentric World Simulator [73.88786358213694]
PlayerOne is the first egocentric realistic world simulator.<n>It generates egocentric videos that are strictly aligned with the real scene human motion of the user captured by an exocentric camera.
arXiv Detail & Related papers (2025-06-11T17:59:53Z) - Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance [30.225654002561512]
We introduce Animate Anyone 2, aiming to animate characters with environment affordance.<n>We propose a shape-agnostic mask strategy that more effectively characterizes the relationship between character and environment.<n>We also introduce a pose modulation strategy that enables the model to handle more diverse motion patterns.
arXiv Detail & Related papers (2025-02-10T04:20:11Z) - X-Dyna: Expressive Dynamic Human Image Animation [49.896933584815926]
X-Dyna is a zero-shot, diffusion-based pipeline for animating a single human image.<n>It generates realistic, context-aware dynamics for both the subject and the surrounding environment.
arXiv Detail & Related papers (2025-01-17T08:10:53Z) - Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance [48.986552871497]
We introduce a novel two-stage framework that employs scene affordance as an intermediate representation.
By leveraging scene affordance maps, our method overcomes the difficulty in generating human motion under multimodal condition signals.
Our approach consistently outperforms all baselines on established benchmarks, including HumanML3D and HUMANISE.
arXiv Detail & Related papers (2024-03-26T18:41:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.