CT4D: Consistent Text-to-4D Generation with Animatable Meshes
- URL: http://arxiv.org/abs/2408.08342v1
- Date: Thu, 15 Aug 2024 14:41:34 GMT
- Title: CT4D: Consistent Text-to-4D Generation with Animatable Meshes
- Authors: Ce Chen, Shaoli Huang, Xuelin Chen, Guangyi Chen, Xiaoguang Han, Kun Zhang, Mingming Gong,
- Abstract summary: We present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user-supplied prompts.
Our framework incorporates a unique Generate-Refine-Animate (GRA) algorithm to enhance the creation of text-aligned meshes.
Our experimental results, both qualitative and quantitative, demonstrate that our CT4D framework surpasses existing text-to-4D techniques in maintaining interframe consistency and preserving global geometry.
- Score: 53.897244823604346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-4D generation has recently been demonstrated viable by integrating a 2D image diffusion model with a video diffusion model. However, existing models tend to produce results with inconsistent motions and geometric structures over time. To this end, we present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user-supplied prompts. The primary challenges of our mesh-based framework involve stably generating a mesh with details that align with the text prompt while directly driving it and maintaining surface continuity. Our CT4D framework incorporates a unique Generate-Refine-Animate (GRA) algorithm to enhance the creation of text-aligned meshes. To improve surface continuity, we divide a mesh into several smaller regions and implement a uniform driving function within each area. Additionally, we constrain the animating stage with a rigidity regulation to ensure cross-region continuity. Our experimental results, both qualitative and quantitative, demonstrate that our CT4D framework surpasses existing text-to-4D techniques in maintaining interframe consistency and preserving global geometry. Furthermore, we showcase that this enhanced representation inherently possesses the capability for combinational 4D generation and texture editing.
Related papers
- Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models [54.35214051961381]
3D meshes are widely used in computer vision and graphics for their efficiency in animation and minimal memory use in movies, games, AR, and VR.
However, creating temporal consistent and realistic textures for mesh remains labor-intensive for professional artists.
We present 3D Tex sequences that integrates inherent geometry from mesh sequences with video diffusion models to produce consistent textures.
arXiv Detail & Related papers (2024-10-14T17:59:59Z) - Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis [60.853577108780414]
Existing 4D generation methods can generate high-quality 4D objects or scenes based on user-friendly conditions.
We propose Trans4D, a novel text-to-4D synthesis framework that enables realistic complex scene transitions.
In experiments, Trans4D consistently outperforms existing state-of-the-art methods in generating 4D scenes with accurate and high-quality transitions.
arXiv Detail & Related papers (2024-10-09T17:56:03Z) - PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting [9.517058280333806]
Previous text-to-4D methods have leveraged multiple Score Distillation Sampling (SDS) techniques.
We introduce textbfPixel-textbfLevel textbfAlignment for text-driven textbf4D Gaussian splatting (PLA4D)
PLA4D provides an anchor reference, i.e., text-generated video, to align the rendering process conditioned by different DMs in pixel space.
arXiv Detail & Related papers (2024-05-30T11:23:01Z) - Comp4D: LLM-Guided Compositional 4D Scene Generation [65.5810466788355]
We present Comp4D, a novel framework for Compositional 4D Generation.
Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately.
Our method employs a compositional score distillation technique guided by the pre-defined trajectories.
arXiv Detail & Related papers (2024-03-25T17:55:52Z) - Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation [48.671462912294594]
We propose a novel framework that generates coherent 4D sequences with animation of 3D shapes under given conditions.
We first employ an integrative latent unified representation to encode shape and color information of each detailed 3D geometry frame.
The proposed skeleton-free latent 4D sequence joint representation allows us to leverage diffusion models in a low-dimensional space to control the generation of 4D sequences.
arXiv Detail & Related papers (2024-03-20T01:59:43Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation.
We identify static 3D assets and monocular video sequences as key components in constructing the 4D content.
Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.