Related papers: Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

URL: http://arxiv.org/abs/2407.11398v2
Date: Mon, 9 Sep 2024 06:21:21 GMT
Title: Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
Authors: Yanqin Jiang, Chaohui Yu, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao,
Abstract summary: Animate3D is a novel framework for animating any static 3D model. We introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects.
Score: 47.05131487114018
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in 4D generation mainly focus on generating 4D content by distilling pre-trained text or single-view image-conditioned models. It is inconvenient for them to take advantage of various off-the-shelf 3D assets with multi-view attributes, and their results suffer from spatiotemporal inconsistency owing to the inherent ambiguity in the supervision signals. In this work, we present Animate3D, a novel framework for animating any static 3D model. The core idea is two-fold: 1) We propose a novel multi-view video diffusion model (MV-VDM) conditioned on multi-view renderings of the static 3D object, which is trained on our presented large-scale multi-view video dataset (MV-Video). 2) Based on MV-VDM, we introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects. Specifically, for MV-VDM, we design a new spatiotemporal attention module to enhance spatial and temporal consistency by integrating 3D and video diffusion models. Additionally, we leverage the static 3D model's multi-view renderings as conditions to preserve its identity. For animating 3D models, an effective two-stage pipeline is proposed: we first reconstruct motions directly from generated multi-view videos, followed by the introduced 4D-SDS to refine both appearance and motion. Benefiting from accurate motion learning, we could achieve straightforward mesh animation. Qualitative and quantitative experiments demonstrate that Animate3D significantly outperforms previous approaches. Data, code, and models will be open-released.

Related papers

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale [42.67300636733286]
We present See3D, a visual-conditional multi-view diffusion model trained on large-scale Internet videos for open-world 3D creation. The model aims to Get 3D knowledge by solely Seeing the visual contents from the vast and rapidly growing video data. Our numerical and visual comparisons on single and sparse reconstruction benchmarks show that See3D, trained on cost-effective and scalable video data, achieves notable zero-shot and open-world generation capabilities.
arXiv Detail & Related papers (2024-12-09T17:44:56Z)
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models [98.03734318657848]
We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. We leverage a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis. We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks.
arXiv Detail & Related papers (2024-11-27T18:57:16Z)
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [112.2625368640425]
High-resolution Image-to-3D model (Hi3D) is a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation. Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior, yielding multi-view images with low-resolution texture details.
arXiv Detail & Related papers (2024-09-11T17:58:57Z)
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency [37.96042037188354]
We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation.
arXiv Detail & Related papers (2024-07-24T17:59:43Z)
MotionDreamer: Exploring Semantic Video Diffusion features for Zero-Shot 3D Mesh Animation [10.263762787854862]
We propose a technique for automatic re-animation of various 3D shapes based on a motion prior extracted from a video diffusion model. We leverage an explicit mesh-based representation compatible with existing computer-graphics pipelines. Our time-efficient zero-shot method achieves a superior performance re-animating a diverse set of 3D shapes.
arXiv Detail & Related papers (2024-05-30T15:30:38Z)
Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models [116.31344506738816]
We present a novel framework, textbfDiffusion4D, for efficient and scalable 4D content generation. We develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets. Our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency.
arXiv Detail & Related papers (2024-05-26T17:47:34Z)
Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models [6.738732514502613]
Diffusion$2$ is a novel framework for dynamic 3D content creation. It reconciles the knowledge about geometric consistency and temporal smoothness from 3D models to directly sample dense multi-view images. Experiments demonstrate the efficacy of our proposed framework in generating highly seamless and consistent 4D assets.
arXiv Detail & Related papers (2024-04-02T17:58:03Z)
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model. Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach. These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z)
Animate124: Animating One Image to 4D Dynamic Scene [108.17635645216214]
Animate124 is the first work to animate a single in-the-wild image into 3D video through textual motion descriptions. Our method demonstrates significant advancements over existing baselines.
arXiv Detail & Related papers (2023-11-24T16:47:05Z)
MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion [57.90404618420159]
We introduce Multi-view Ancestral Sampling (MAS), a method for 3D motion generation. MAS works by simultaneously denoising multiple 2D motion sequences representing different views of the same 3D motion. We demonstrate MAS on 2D pose data acquired from videos depicting professional basketball maneuvers.
arXiv Detail & Related papers (2023-10-23T09:05:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.