ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion
- URL: http://arxiv.org/abs/2601.16148v1
- Date: Thu, 22 Jan 2026 17:41:13 GMT
- Title: ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion
- Authors: Remy Sabathier, David Novotny, Niloy J. Mitra, Tom Monnier,
- Abstract summary: Action-ready 3D model ready for action feedforward.<n>Autogenerative 3Dcoder generates 3D meshes with two inputs.<n>Extracts 3D shape with two inputs.
- Score: 32.32525061239629
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating animated 3D objects is at the heart of many applications, yet most advanced works are typically difficult to apply in practice because of their limited setup, their long runtime, or their limited quality. We introduce ActionMesh, a generative model that predicts production-ready 3D meshes "in action" in a feed-forward manner. Drawing inspiration from early video models, our key insight is to modify existing 3D diffusion models to include a temporal axis, resulting in a framework we dubbed "temporal 3D diffusion". Specifically, we first adapt the 3D diffusion stage to generate a sequence of synchronized latents representing time-varying and independent 3D shapes. Second, we design a temporal 3D autoencoder that translates a sequence of independent shapes into the corresponding deformations of a pre-defined reference shape, allowing us to build an animation. Combining these two components, ActionMesh generates animated 3D meshes from different inputs like a monocular video, a text description, or even a 3D mesh with a text prompt describing its animation. Besides, compared to previous approaches, our method is fast and produces results that are rig-free and topology consistent, hence enabling rapid iteration and seamless applications like texturing and retargeting. We evaluate our model on standard video-to-4D benchmarks (Consistent4D, Objaverse) and report state-of-the-art performances on both geometric accuracy and temporal consistency, demonstrating that our model can deliver animated 3D meshes with unprecedented speed and quality.
Related papers
- Instant Expressive Gaussian Head Avatar via 3D-Aware Expression Distillation [46.27695095774081]
2D diffusion-based methods often compromise 3D consistency and speed.<n>3D-aware facial animation feedforward methods ensure 3D consistency and achieve faster inference speed.<n>Our method runs at 107.31 FPS for animation and pose control while achieving comparable animation quality to the state-of-the-art.
arXiv Detail & Related papers (2025-12-18T18:53:28Z) - 4-Doodle: Text to 3D Sketches that Move! [60.89021458068987]
4-Doodle is the first training-free framework for generating dynamic 3D sketches from text.<n>Our method produces temporally realistic and structurally stable 3D sketch animations, outperforming existing baselines in both fidelity and controllability.
arXiv Detail & Related papers (2025-10-29T09:33:29Z) - PromptVFX: Text-Driven Fields for Open-World 3D Gaussian Animation [49.91188543847175]
We reformulate 3D animation as a field prediction task and introduce a text-driven framework that infers a time-varying 4D flow field acting on 3D Gaussians.<n>By leveraging large language models (LLMs) and vision-language models (VLMs) for function generation, our approach interprets arbitrary prompts and instantly updates color, opacity, and positions of 3D Gaussians in real time.
arXiv Detail & Related papers (2025-06-01T17:22:59Z) - Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models [71.78723353724493]
Animation of humanoid characters is essential in various graphics applications.<n>We propose an approach to synthesize 4D animated sequences of input static 3D humanoid meshes.
arXiv Detail & Related papers (2025-03-20T10:00:22Z) - Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes [49.26872036160368]
We propose a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation.<n>We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D scenes.
arXiv Detail & Related papers (2024-11-28T16:01:58Z) - Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters [86.13319549186959]
We present Make-It-Animatable, a novel data-driven method to make any 3D humanoid model ready for character animation in less than one second.<n>Our framework generates high-quality blend weights, bones, and pose transformations.<n>Compared to existing methods, our approach demonstrates significant improvements in both quality and speed.
arXiv Detail & Related papers (2024-11-27T10:18:06Z) - Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models [54.35214051961381]
3D meshes are widely used in computer vision and graphics for their efficiency in animation and minimal memory use in movies, games, AR, and VR.<n>However, creating temporal consistent and realistic textures for mesh remains labor-intensive for professional artists.<n>We present 3D Tex sequences that integrates inherent geometry from mesh sequences with video diffusion models to produce consistent textures.
arXiv Detail & Related papers (2024-10-14T17:59:59Z) - Animate3D: Animating Any 3D Model with Multi-view Video Diffusion [47.05131487114018]
Animate3D is a novel framework for animating any static 3D model.
We introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects.
arXiv Detail & Related papers (2024-07-16T05:35:57Z) - iHuman: Instant Animatable Digital Humans From Monocular Videos [16.98924995658091]
We present a fast, simple, yet effective method for creating animatable 3D digital humans from monocular videos.
This work achieves and illustrates the need of accurate 3D mesh-type modelling of the human body.
Our method is faster by an order of magnitude (in terms of training time) than its closest competitor.
arXiv Detail & Related papers (2024-07-15T18:51:51Z) - Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion [3.545941891218148]
We investigate whether it is necessary to explicitly enforce multiview consistency over time, as current approaches do, or if it is sufficient for a model to generate 3D representations of each timestep independently.
We propose a model, Vid3D, that leverages 2D video diffusion to generate 3D videos by first generating a 2D "seed" of the video's temporal dynamics and then independently generating a 3D representation for each timestep in the seed video.
arXiv Detail & Related papers (2024-06-17T04:09:04Z) - MotionDreamer: Exploring Semantic Video Diffusion features for Zero-Shot 3D Mesh Animation [10.263762787854862]
We propose a technique for automatic re-animation of various 3D shapes based on a motion prior extracted from a video diffusion model.
We leverage an explicit mesh-based representation compatible with existing computer-graphics pipelines.
Our time-efficient zero-shot method achieves a superior performance re-animating a diverse set of 3D shapes.
arXiv Detail & Related papers (2024-05-30T15:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.