Fugu-MT 論文翻訳(概要): Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models

論文の概要: Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models

arxiv url: http://arxiv.org/abs/2404.02148v2
Date: Sat, 20 Apr 2024 14:45:54 GMT
ステータス: 翻訳完了
システム内更新日: 2024-04-23 22:45:14.685812
Title: Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models
Title（参考訳）: 拡散$^2$:直交拡散モデルのスコア構成による動的3次元コンテンツ生成
Authors: Zeyu Yang, Zijie Pan, Chun Gu, Li Zhang,
Abstract要約: 動的3Dコンテンツ作成のための新しいフレームワークであるDiffusion$2を提示する。私たちのフレームワークは数分で4Dコンテンツを生成できます。
参考スコア（独自算出の注目度）: 6.738732514502613
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in 3D generation are predominantly propelled by improvements in 3D-aware image diffusion models which are pretrained on Internet-scale image data and fine-tuned on massive 3D data, offering the capability of producing highly consistent multi-view images. However, due to the scarcity of synchronized multi-view video data, it is impractical to adapt this paradigm to 4D generation directly. Despite that, the available video and 3D data are adequate for training video and multi-view diffusion models that can provide satisfactory dynamic and geometric priors respectively. In this paper, we present Diffusion$^2$, a novel framework for dynamic 3D content creation that leverages the knowledge about geometric consistency and temporal smoothness from these models to directly sample dense multi-view and multi-frame images which can be employed to optimize continuous 4D representation. Specifically, we design a simple yet effective denoising strategy via score composition of video and multi-view diffusion models based on the probability structure of the images to be generated. Owing to the high parallelism of the image generation and the efficiency of the modern 4D reconstruction pipeline, our framework can generate 4D content within few minutes. Furthermore, our method circumvents the reliance on 4D data, thereby having the potential to benefit from the scalability of the foundation video and multi-view diffusion models. Extensive experiments demonstrate the efficacy of our proposed framework and its capability to flexibly adapt to various types of prompts.
Abstract（参考訳）: 近年の3D生成の進歩は、インターネット規模の画像データで事前訓練され、大量の3Dデータで微調整された3D対応画像拡散モデルの改善により、高度に一貫したマルチビュー画像を生成する能力によって大きく促進されている。しかし、同期したマルチビュービデオデータが不足しているため、このパラダイムを4D生成に直接適用することは不可能である。それにもかかわらず、利用可能なビデオと3Dデータは、ビデオと多視点拡散モデルのトレーニングに適しており、それぞれが満足できる動的および幾何学的事前情報を提供することができる。本稿では,これらのモデルからの幾何的整合性および時間的滑らか性に関する知識を活用し,連続した4次元表現の最適化に使用できる高密度な多視点画像と多フレーム画像を直接サンプリングする動的3次元コンテンツ作成のための新しいフレームワークであるDiffusion$^2$を提案する。具体的には、生成する画像の確率構造に基づいて、ビデオと多視点拡散モデルのスコア合成による簡易かつ効果的な復調戦略を設計する。画像生成の並列性の高さと現代の4D再構成パイプラインの効率性により、我々のフレームワークは数分で4Dコンテンツを生成できる。さらに,本手法は4次元データへの依存を回避し,基礎映像や多視点拡散モデルのスケーラビリティから恩恵を受ける可能性がある。大規模な実験により,提案手法の有効性と各種のプロンプトに柔軟に適応する能力が実証された。

論文の概要: Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models

関連論文リスト