Fugu-MT 論文翻訳(概要): A Unified Approach for Text- and Image-guided 4D Scene Generation

論文の概要: A Unified Approach for Text- and Image-guided 4D Scene Generation

arxiv url: http://arxiv.org/abs/2311.16854v2
Date: Wed, 29 Nov 2023 15:56:38 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-30 12:20:13.166829
Title: A Unified Approach for Text- and Image-guided 4D Scene Generation
Title（参考訳）: テキストと画像を用いた4次元シーン生成のための統一的アプローチ
Authors: Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Karsten Kreis, Otmar Hilliges, Shalini De Mello
Abstract要約: 本研究では,テキストから4Dへの新たな2段階合成手法であるDream-in-4Dを提案する。提案手法は,画像の画質,3次元整合性,テキスト・ツー・4次元生成におけるテキストの忠実度を著しく向上することを示す。本手法は,テキストから4D,画像から4D,パーソナライズされた4D生成タスクに対して,初めて統一されたアプローチを提供する。
参考スコア（独自算出の注目度）: 61.60025506794648
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale diffusion generative models are greatly simplifying image, video and 3D asset creation from user-provided text prompts and images. However, the challenging problem of text-to-4D dynamic 3D scene generation with diffusion guidance remains largely unexplored. We propose Dream-in-4D, which features a novel two-stage approach for text-to-4D synthesis, leveraging (1) 3D and 2D diffusion guidance to effectively learn a high-quality static 3D asset in the first stage; (2) a deformable neural radiance field that explicitly disentangles the learned static asset from its deformation, preserving quality during motion learning; and (3) a multi-resolution feature grid for the deformation field with a displacement total variation loss to effectively learn motion with video diffusion guidance in the second stage. Through a user preference study, we demonstrate that our approach significantly advances image and motion quality, 3D consistency and text fidelity for text-to-4D generation compared to baseline approaches. Thanks to its motion-disentangled representation, Dream-in-4D can also be easily adapted for controllable generation where appearance is defined by one or multiple images, without the need to modify the motion learning stage. Thus, our method offers, for the first time, a unified approach for text-to-4D, image-to-4D and personalized 4D generation tasks.
Abstract（参考訳）: 大規模拡散生成モデルは,ユーザが提供するテキストプロンプトと画像から画像,映像,および3dアセットの作成を大幅に単純化している。しかし,拡散指導を伴うテキストから4次元動的3次元シーン生成の課題はほとんど解明されていない。 We propose Dream-in-4D, which features a novel two-stage approach for text-to-4D synthesis, leveraging (1) 3D and 2D diffusion guidance to effectively learn a high-quality static 3D asset in the first stage; (2) a deformable neural radiance field that explicitly disentangles the learned static asset from its deformation, preserving quality during motion learning; and (3) a multi-resolution feature grid for the deformation field with a displacement total variation loss to effectively learn motion with video diffusion guidance in the second stage. ユーザの嗜好調査を通じて,本手法はベースラインアプローチと比較して,画像品質,3次元一貫性,テキストの忠実度を著しく向上することを示した。動きの不連続表現のおかげで、dream-in-4dは、動き学習段階を変更することなく、1つまたは複数の画像で外観を定義する制御可能な生成にも容易に適応できる。そこで本手法は,テキストから4D,画像から4D,パーソナライズされた4D生成タスクに対して,初めて統一的なアプローチを提供する。

論文の概要: A Unified Approach for Text- and Image-guided 4D Scene Generation

関連論文リスト