Related papers: AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models

AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models

URL: http://arxiv.org/abs/2506.19851v1
Date: Tue, 24 Jun 2025 17:59:58 GMT
Title: AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models
Authors: Zehuan Huang, Haoran Feng, Yangtian Sun, Yuanchen Guo, Yanpei Cao, Lu Sheng,
Abstract summary: AnimaX is a feed-forward 3D animation framework that bridges the motion priors of video diffusion models with the controllable structure of skeleton-based animation.<n>Our method represents 3D motion as multi-view, multi-frame 2D pose maps, and enables joint video-pose diffusion conditioned on template renderings.
Score: 24.410731608387238
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present AnimaX, a feed-forward 3D animation framework that bridges the motion priors of video diffusion models with the controllable structure of skeleton-based animation. Traditional motion synthesis methods are either restricted to fixed skeletal topologies or require costly optimization in high-dimensional deformation spaces. In contrast, AnimaX effectively transfers video-based motion knowledge to the 3D domain, supporting diverse articulated meshes with arbitrary skeletons. Our method represents 3D motion as multi-view, multi-frame 2D pose maps, and enables joint video-pose diffusion conditioned on template renderings and a textual motion prompt. We introduce shared positional encodings and modality-aware embeddings to ensure spatial-temporal alignment between video and pose sequences, effectively transferring video priors to motion generation task. The resulting multi-view pose sequences are triangulated into 3D joint positions and converted into mesh animation via inverse kinematics. Trained on a newly curated dataset of 160,000 rigged sequences, AnimaX achieves state-of-the-art results on VBench in generalization, motion fidelity, and efficiency, offering a scalable solution for category-agnostic 3D animation. Project page: \href{https://anima-x.github.io/}{https://anima-x.github.io/}.

Related papers

Sketch2Anim: Towards Transferring Sketch Storyboards into 3D Animation [22.325990468075368]
Animators use the 2D sketches in storyboards as references to craft the desired 3D animations through a trial-and-error process.<n>There is a high demand for automated methods that can directly translate 2D storyboard sketches into 3D animations.<n>We present Sketch2Anim, composed of two key modules for sketch constraint understanding and motion generation.
arXiv Detail & Related papers (2025-04-27T10:38:17Z)
In-2-4D: Inbetweening from Two Single-View Images to 4D Generation [54.62824686338408]
We propose a new problem, In-between2-4D, for generative 4D (i.e., 3D + motion) in Splating from a minimalistic input setting.<n>Given two images representing the start and end states of an object in motion, our goal is to generate and reconstruct the motion in 4D.
arXiv Detail & Related papers (2025-04-11T09:01:09Z)
Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models [71.78723353724493]
Animation of humanoid characters is essential in various graphics applications.<n>We propose an approach to synthesize 4D animated sequences of input static 3D humanoid meshes.
arXiv Detail & Related papers (2025-03-20T10:00:22Z)
Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes [49.26872036160368]
We propose a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation.<n>We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D scenes.
arXiv Detail & Related papers (2024-11-28T16:01:58Z)
MotionDreamer: Exploring Semantic Video Diffusion features for Zero-Shot 3D Mesh Animation [10.263762787854862]
We propose a technique for automatic re-animation of various 3D shapes based on a motion prior extracted from a video diffusion model. We leverage an explicit mesh-based representation compatible with existing computer-graphics pipelines. Our time-efficient zero-shot method achieves a superior performance re-animating a diverse set of 3D shapes.
arXiv Detail & Related papers (2024-05-30T15:30:38Z)
AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment [64.02822911038848]
We present AnimateZoo, a zero-shot diffusion-based video generator to produce animal animations. Key technique used in our AnimateZoo is subject alignment, which includes two steps. Our model is capable of generating videos characterized by accurate movements, consistent appearance, and high-fidelity frames.
arXiv Detail & Related papers (2024-04-07T12:57:41Z)
AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies [98.65469430034246]
Existing datasets for two-dimensional (2D) cartoon suffer from simple frame composition and monotonic movements. We present a new 2D animation visual correspondence dataset, AnimeRun, by converting open source 3D movies to full scenes in 2D style. Our analyses show that the proposed dataset not only resembles real anime more in image composition, but also possesses richer and more complex motion patterns compared to existing datasets.
arXiv Detail & Related papers (2022-11-10T17:26:21Z)
Deep Animation Video Interpolation in the Wild [115.24454577119432]
In this work, we formally define and study the animation video code problem for the first time. We propose an effective framework, AnimeInterp, with two dedicated modules in a coarse-to-fine manner. Notably, AnimeInterp shows favorable perceptual quality and robustness for animation scenarios in the wild.
arXiv Detail & Related papers (2021-04-06T13:26:49Z)
Going beyond Free Viewpoint: Creating Animatable Volumetric Video of Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances. Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data. For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.