Related papers: AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation

AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation

URL: http://arxiv.org/abs/2506.09982v1
Date: Wed, 11 Jun 2025 17:55:16 GMT
Title: AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
Authors: Zijie Wu, Chaohui Yu, Fan Wang, Xiang Bai,
Abstract summary: In this paper, we present AnimateAnyMesh, the first feed-forward framework that enables efficient text-driven animation of arbitrary 3D meshes.<n>Our approach leverages a novel DyMeshVAE architecture that effectively compresses and reconstructs dynamic mesh sequences.<n>We also contribute the DyMesh dataset, containing over 4M diverse dynamic mesh sequences with text annotations.
Score: 57.199352741915625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in 4D content generation have attracted increasing attention, yet creating high-quality animated 3D models remains challenging due to the complexity of modeling spatio-temporal distributions and the scarcity of 4D training data. In this paper, we present AnimateAnyMesh, the first feed-forward framework that enables efficient text-driven animation of arbitrary 3D meshes. Our approach leverages a novel DyMeshVAE architecture that effectively compresses and reconstructs dynamic mesh sequences by disentangling spatial and temporal features while preserving local topological structures. To enable high-quality text-conditional generation, we employ a Rectified Flow-based training strategy in the compressed latent space. Additionally, we contribute the DyMesh Dataset, containing over 4M diverse dynamic mesh sequences with text annotations. Experimental results demonstrate that our method generates semantically accurate and temporally coherent mesh animations in a few seconds, significantly outperforming existing approaches in both quality and efficiency. Our work marks a substantial step forward in making 4D content creation more accessible and practical. All the data, code, and models will be open-released.

Related papers

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis [31.632778145139074]
Direct 4D diffusion modeling is extremely challenging due to costly data construction and the high-dimensional nature of jointly representing 3D shape, appearance, and motion.<n>We introduce a Direct 4DMesh-to-GS Variation Field VAE that directly encodes canonical Gaussians and their temporal variations from 3D animation data.<n>We train a temporal-aware Diffusion Transformer conditioned on input videos and canonical GS.
arXiv Detail & Related papers (2025-07-31T17:59:51Z)
TextMesh4D: High-Quality Text-to-4D Mesh Generation [13.069414103080447]
We introduce TextMesh4D, a novel framework for high-quality text-to-4D generation.<n>Our approach leverages per-face Jacobians as a differentiable mesh representation and decomposes 4D generation into two stages: static object creation and dynamic motion synthesis.<n> Experiments demonstrate that TextMesh4D state-of-the-art results in terms of temporal consistency, structural fidelity, and visual realism.
arXiv Detail & Related papers (2025-06-30T17:58:34Z)
Bringing Objects to Life: training-free 4D generation from 3D objects through view consistent noise [31.533802484121182]
We introduce a training-free method for animating 3D objects by conditioning on textual prompts to guide 4D generation.<n>We first convert a 3D mesh into a 4D Neural Radiance Field (NeRF) that preserves the object's visual attributes.<n>Then, we animate the object using an Image-to-Video diffusion model driven by text.
arXiv Detail & Related papers (2024-12-29T10:12:01Z)
Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models [54.35214051961381]
3D meshes are widely used in computer vision and graphics for their efficiency in animation and minimal memory use in movies, games, AR, and VR.<n>However, creating temporal consistent and realistic textures for mesh remains labor-intensive for professional artists.<n>We present 3D Tex sequences that integrates inherent geometry from mesh sequences with video diffusion models to produce consistent textures.
arXiv Detail & Related papers (2024-10-14T17:59:59Z)
CT4D: Consistent Text-to-4D Generation with Animatable Meshes [53.897244823604346]
We present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user-supplied prompts. Our framework incorporates a unique Generate-Refine-Animate (GRA) algorithm to enhance the creation of text-aligned meshes. Our experimental results, both qualitative and quantitative, demonstrate that our CT4D framework surpasses existing text-to-4D techniques in maintaining interframe consistency and preserving global geometry.
arXiv Detail & Related papers (2024-08-15T14:41:34Z)
Comp4D: LLM-Guided Compositional 4D Scene Generation [65.5810466788355]
We present Comp4D, a novel framework for Compositional 4D Generation. Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately. Our method employs a compositional score distillation technique guided by the pre-defined trajectories.
arXiv Detail & Related papers (2024-03-25T17:55:52Z)
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
We present textbf4DGen, a novel framework for grounded 4D content creation.<n>Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations.<n>Compared to existing video-to-4D baselines, our approach yields superior results in faithfully reconstructing input signals.
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models [94.07744207257653]
We focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects. We combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization.
arXiv Detail & Related papers (2023-12-21T11:41:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.