Related papers: Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

URL: http://arxiv.org/abs/2502.06145v1
Date: Mon, 10 Feb 2025 04:20:11 GMT
Title: Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
Authors: Li Hu, Guangyuan Wang, Zhen Shen, Xin Gao, Dechao Meng, Lian Zhuo, Peng Zhang, Bang Zhang, Liefeng Bo,
Abstract summary: We introduce Animate Anyone 2, aiming to animate characters with environment affordance.<n>We propose a shape-agnostic mask strategy that more effectively characterizes the relationship between character and environment.<n>We also introduce a pose modulation strategy that enables the model to handle more diverse motion patterns.
Score: 30.225654002561512
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent character image animation methods based on diffusion models, such as Animate Anyone, have made significant progress in generating consistent and generalizable character animations. However, these approaches fail to produce reasonable associations between characters and their environments. To address this limitation, we introduce Animate Anyone 2, aiming to animate characters with environment affordance. Beyond extracting motion signals from source video, we additionally capture environmental representations as conditional inputs. The environment is formulated as the region with the exclusion of characters and our model generates characters to populate these regions while maintaining coherence with the environmental context. We propose a shape-agnostic mask strategy that more effectively characterizes the relationship between character and environment. Furthermore, to enhance the fidelity of object interactions, we leverage an object guider to extract features of interacting objects and employ spatial blending for feature injection. We also introduce a pose modulation strategy that enables the model to handle more diverse motion patterns. Experimental results demonstrate the superior performance of the proposed method.

Related papers

IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation [58.297199313494]
Implicit methods capture motion semantics directly from driving video, but suffer from identity leakage and entanglement between motion and appearance.<n>We propose a novel implicit motion representation that compresses per-frame motion into compact 1D motion tokens.<n>Our methodology employs a three-stage training strategy to enhance the training efficiency and ensure high fidelity.
arXiv Detail & Related papers (2026-02-07T11:17:20Z)
Animate Any Character in Any World [61.112404900403284]
We introduce AniX, leveraging the realism and structural grounding of static world generation.<n>Users can provide a 3DGS scene and a character, then direct the character through natural language to perform diverse behaviors.<n>AiX synthesizes temporally coherent video clips that preserve visual fidelity with the provided scene and character.
arXiv Detail & Related papers (2025-12-18T18:59:18Z)
Environment-aware Motion Matching [6.397763079214294]
Environment-aware Motion Matching is a novel real-time system for full-body character animation.<n>Our method allows characters to naturally adjust their pose and trajectory to navigate crowded scenes.
arXiv Detail & Related papers (2025-10-26T11:28:50Z)
Wan-Animate: Unified Character Animation and Replacement with Holistic Replication [53.619006977292635]
We introduce Wan-Animate, a unified framework for character animation and replacement.<n>It can animate the character by precisely replicating the expressions and movements of the character in the video to generate high-fidelity character videos.<n>It can integrate the animated character into the reference video to replace the original character, replicating the scene's lighting and color tone.
arXiv Detail & Related papers (2025-09-17T15:00:57Z)
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation [58.41979933166173]
EvAnimate is a framework that leverages event streams as motion cues to animate static human images. We show that EvAnimate achieves high temporal fidelity and robust performance in scenarios where traditional video-derived cues fall short.
arXiv Detail & Related papers (2025-03-24T11:05:41Z)
X-Dyna: Expressive Dynamic Human Image Animation [49.896933584815926]
X-Dyna is a zero-shot, diffusion-based pipeline for animating a single human image.<n>It generates realistic, context-aware dynamics for both the subject and the surrounding environment.
arXiv Detail & Related papers (2025-01-17T08:10:53Z)
AniFaceDiff: Animating Stylized Avatars via Parametric Conditioned Diffusion Models [33.39336530229545]
This paper proposes a new method based on Stable Diffusion, called AniFaceDiff, incorporating a new conditioning module for animating stylized avatars.<n>Our approach effectively preserves pose and expression from the target video while maintaining input image consistency.<n>This work aims to enhance the quality of virtual stylized animation for positive applications.
arXiv Detail & Related papers (2024-06-19T07:08:48Z)
Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image. Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z)
Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions. Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z)
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. In this paper, we propose a novel framework tailored for character animation. By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z)
Synthesizing Physical Character-Scene Interactions [64.26035523518846]
It is necessary to synthesize such interactions between virtual characters and their surroundings. We present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters. Our approach takes physics-based character motion generation a step closer to broad applicability.
arXiv Detail & Related papers (2023-02-02T05:21:32Z)
Triangular Character Animation Sampling with Motion, Emotion, and Relation [78.80083186208712]
We present a novel framework to sample and synthesize animations by associating the characters' body motions, facial expressions, and social relations. Our method can provide animators with an automatic way to generate 3D character animations, help synthesize interactions between Non-Player Characters (NPCs) and enhance machine emotion intelligence in virtual reality (VR)
arXiv Detail & Related papers (2022-03-09T18:19:03Z)
Motion Representations for Articulated Animation [34.54825980226596]
We propose novel motion representations for animating articulated objects consisting of distinct parts. In a completely unsupervised manner, our method identifies object parts, tracks them in a driving video, and infers their motions by considering their principal axes. Our model can animate a variety of objects, surpassing previous methods by a large margin on existing benchmarks.
arXiv Detail & Related papers (2021-04-22T18:53:56Z)
Going beyond Free Viewpoint: Creating Animatable Volumetric Video of Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances. Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data. For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.