CharacterShot: Controllable and Consistent 4D Character Animation
- URL: http://arxiv.org/abs/2508.07409v1
- Date: Sun, 10 Aug 2025 16:15:04 GMT
- Title: CharacterShot: Controllable and Consistent 4D Character Animation
- Authors: Junyao Gao, Jiaxing Li, Wenran Liu, Yanhong Zeng, Fei Shen, Kai Chen, Yanan Sun, Cairong Zhao,
- Abstract summary: We propose textbfCharacterShot, a controllable and consistent 4D character animation framework.<n>It enables any individual designer to create dynamic 3D characters (i.e., 4D character animation) from a single reference character image and a 2D pose sequence.
- Score: 20.631610991598297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose \textbf{CharacterShot}, a controllable and consistent 4D character animation framework that enables any individual designer to create dynamic 3D characters (i.e., 4D character animation) from a single reference character image and a 2D pose sequence. We begin by pretraining a powerful 2D character animation model based on a cutting-edge DiT-based image-to-video model, which allows for any 2D pose sequnce as controllable signal. We then lift the animation model from 2D to 3D through introducing dual-attention module together with camera prior to generate multi-view videos with spatial-temporal and spatial-view consistency. Finally, we employ a novel neighbor-constrained 4D gaussian splatting optimization on these multi-view videos, resulting in continuous and stable 4D character representations. Moreover, to improve character-centric performance, we construct a large-scale dataset Character4D, containing 13,115 unique characters with diverse appearances and motions, rendered from multiple viewpoints. Extensive experiments on our newly constructed benchmark, CharacterBench, demonstrate that our approach outperforms current state-of-the-art methods. Code, models, and datasets will be publicly available at https://github.com/Jeoyal/CharacterShot.
Related papers
- VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control [83.92729346325163]
VerseCrafter is a 4D-aware video world model that enables explicit and coherent control over both camera and object dynamics.<n>Our approach is centered on a novel 4D Geometric Control representation, which encodes the world state through a static background point cloud.<n>These 4D controls are rendered into conditioning signals for a pretrained video diffusion model, enabling the generation of high-fidelity, view-consistent videos.
arXiv Detail & Related papers (2026-01-08T17:28:52Z) - Tracking-Guided 4D Generation: Foundation-Tracker Motion Priors for 3D Model Animation [21.075786141331974]
We present emphTrack4DGen, a framework for generating dynamic 4D objects from sparse inputs.<n>In Stage One, we enforce dense, feature-level point correspondences inside the diffusion generator.<n>In Stage Two, we reconstruct a dynamic 4D-GS using a hybrid motion encoding.
arXiv Detail & Related papers (2025-12-05T21:13:04Z) - Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z) - 4-Doodle: Text to 3D Sketches that Move! [60.89021458068987]
4-Doodle is the first training-free framework for generating dynamic 3D sketches from text.<n>Our method produces temporally realistic and structurally stable 3D sketch animations, outperforming existing baselines in both fidelity and controllability.
arXiv Detail & Related papers (2025-10-29T09:33:29Z) - FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image [41.598551483524666]
We present a novel framework for generating high-quality, animatable 4D avatar from a single image.<n>Our method achieves superior quality compared to the prior art, while maintaining consistency across different viewpoints and expressions.
arXiv Detail & Related papers (2025-04-21T15:40:14Z) - Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization [31.956858341885436]
Video4DGen is a novel framework that excels in generating 4D representations from single or multiple generated videos.<n>Video4DGen offers a powerful tool for applications in virtual reality, animation, and beyond.
arXiv Detail & Related papers (2025-04-05T12:13:05Z) - DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses [57.17501809717155]
We present DreamDance, a novel method for animating human images using only skeleton pose sequences as conditional inputs.<n>Our key insight is that human images naturally exhibit multiple levels of correlation.<n>We construct the TikTok-Dance5K dataset, comprising 5K high-quality dance videos with detailed frame annotations.
arXiv Detail & Related papers (2024-11-30T08:42:13Z) - Animate3D: Animating Any 3D Model with Multi-view Video Diffusion [47.05131487114018]
Animate3D is a novel framework for animating any static 3D model.
We introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects.
arXiv Detail & Related papers (2024-07-16T05:35:57Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
We present textbf4DGen, a novel framework for grounded 4D content creation.<n>Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations.<n>Compared to existing video-to-4D baselines, our approach yields superior results in faithfully reconstructing input signals.
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.