Related papers: Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation

Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation

URL: http://arxiv.org/abs/2403.13238v1
Date: Wed, 20 Mar 2024 01:59:43 GMT
Title: Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation
Authors: Qitong Yang, Mingtao Feng, Zijie Wu, Shijie Sun, Weisheng Dong, Yaonan Wang, Ajmal Mian,
Abstract summary: We propose a novel framework that generates coherent 4D sequences with animation of 3D shapes under given conditions. We first employ an integrative latent unified representation to encode shape and color information of each detailed 3D geometry frame. The proposed skeleton-free latent 4D sequence joint representation allows us to leverage diffusion models in a low-dimensional space to control the generation of 4D sequences.
Score: 48.671462912294594
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Directly learning to model 4D content, including shape, color and motion, is challenging. Existing methods depend on skeleton-based motion control and offer limited continuity in detail. To address this, we propose a novel framework that generates coherent 4D sequences with animation of 3D shapes under given conditions with dynamic evolution of shape and color over time through integrative latent mapping. We first employ an integrative latent unified representation to encode shape and color information of each detailed 3D geometry frame. The proposed skeleton-free latent 4D sequence joint representation allows us to leverage diffusion models in a low-dimensional space to control the generation of 4D sequences. Finally, temporally coherent 4D sequences are generated conforming well to the input images and text prompts. Extensive experiments on the ShapeNet, 3DBiCar and DeformingThings4D datasets for several tasks demonstrate that our method effectively learns to generate quality 3D shapes with color and 4D mesh animations, improving over the current state-of-the-art. Source code will be released.

Related papers

AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation [57.199352741915625]
In this paper, we present AnimateAnyMesh, the first feed-forward framework that enables efficient text-driven animation of arbitrary 3D meshes.<n>Our approach leverages a novel DyMeshVAE architecture that effectively compresses and reconstructs dynamic mesh sequences.<n>We also contribute the DyMesh dataset, containing over 4M diverse dynamic mesh sequences with text annotations.
arXiv Detail & Related papers (2025-06-11T17:55:16Z)
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image. Our key insight is to distill pre-trained foundation models for consistent 4D scene representation. The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z)
DNF: Unconditional 4D Generation with Dictionary-based Neural Fields [32.25164913000621]
We propose DNF, a new 4D representation for unconditional generative modeling.<n>D DNF efficiently models deformable shapes with disentangled shape and motion.<n>Our method is able to generate effective, high-fidelity 4D animations.
arXiv Detail & Related papers (2024-12-06T16:25:57Z)
CT4D: Consistent Text-to-4D Generation with Animatable Meshes [53.897244823604346]
We present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user-supplied prompts. Our framework incorporates a unique Generate-Refine-Animate (GRA) algorithm to enhance the creation of text-aligned meshes. Our experimental results, both qualitative and quantitative, demonstrate that our CT4D framework surpasses existing text-to-4D techniques in maintaining interframe consistency and preserving global geometry.
arXiv Detail & Related papers (2024-08-15T14:41:34Z)
Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models [116.31344506738816]
We present a novel framework, textbfDiffusion4D, for efficient and scalable 4D content generation. We develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets. Our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency.
arXiv Detail & Related papers (2024-05-26T17:47:34Z)
MagicPose4D: Crafting Articulated Models with Appearance and Motion Control [17.161695123524563]
We propose MagicPose4D, a framework for refined control over both appearance and motion in 4D generation. Unlike current 4D generation methods, MagicPose4D accepts monocular videos or mesh sequences as motion prompts. We demonstrate that MagicPose4D significantly improves the accuracy and consistency of 4D content generation.
arXiv Detail & Related papers (2024-05-22T21:51:01Z)
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation [52.772319840580074]
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation. We introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling.
arXiv Detail & Related papers (2024-03-27T04:09:34Z)
Comp4D: LLM-Guided Compositional 4D Scene Generation [65.5810466788355]
We present Comp4D, a novel framework for Compositional 4D Generation. Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately. Our method employs a compositional score distillation technique guided by the pre-defined trajectories.
arXiv Detail & Related papers (2024-03-25T17:55:52Z)
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models [94.07744207257653]
We focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects. We combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization.
arXiv Detail & Related papers (2023-12-21T11:41:02Z)
Consistent4D: Consistent 360{\deg} Dynamic Object Generation from Monocular Video [15.621374353364468]
Consistent4D is a novel approach for generating 4D dynamic objects from uncalibrated monocular videos. We cast the 360-degree dynamic object reconstruction as a 4D generation problem, eliminating the need for tedious multi-view data collection and camera calibration.
arXiv Detail & Related papers (2023-11-06T03:26:43Z)
Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation [47.945556996219295]
We present a novel alignment-before-generation approach to generate 3D shapes based on 2D images or texts. Our framework comprises two models: a Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model (ASLDM)
arXiv Detail & Related papers (2023-06-29T17:17:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.