Fugu-MT 論文翻訳(概要): X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

論文の概要: X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

arxiv url: http://arxiv.org/abs/2312.00085v2
Date: Mon, 25 Dec 2023 05:46:18 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-29 21:33:24.276262
Title: X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation
Title（参考訳）: X-Dreamer:テキスト対2Dとテキスト対3Dの領域ギャップを埋めて高品質な3Dコンテンツを作成する
Authors: Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji
Abstract要約: X-Dreamerは高品質なテキストから3Dコンテンツを作成するための新しいアプローチである。テキスト対2D合成とテキスト対3D合成のギャップを埋める。
参考スコア（独自算出の注目度）: 64.12848271290119
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent times, automatic text-to-3D content creation has made significant progress, driven by the development of pretrained 2D diffusion models. Existing text-to-3D methods typically optimize the 3D representation to ensure that the rendered image aligns well with the given text, as evaluated by the pretrained 2D diffusion model. Nevertheless, a substantial domain gap exists between 2D images and 3D assets, primarily attributed to variations in camera-related attributes and the exclusive presence of foreground objects. Consequently, employing 2D diffusion models directly for optimizing 3D representations may lead to suboptimal outcomes. To address this issue, we present X-Dreamer, a novel approach for high-quality text-to-3D content creation that effectively bridges the gap between text-to-2D and text-to-3D synthesis. The key components of X-Dreamer are two innovative designs: Camera-Guided Low-Rank Adaptation (CG-LoRA) and Attention-Mask Alignment (AMA) Loss. CG-LoRA dynamically incorporates camera information into the pretrained diffusion models by employing camera-dependent generation for trainable parameters. This integration enhances the alignment between the generated 3D assets and the camera's perspective. AMA loss guides the attention map of the pretrained diffusion model using the binary mask of the 3D object, prioritizing the creation of the foreground object. This module ensures that the model focuses on generating accurate and detailed foreground objects. Extensive evaluations demonstrate the effectiveness of our proposed method compared to existing text-to-3D approaches. Our project webpage: https://xmu-xiaoma666.github.io/Projects/X-Dreamer/ .
Abstract（参考訳）: 近年,事前学習された2次元拡散モデルの開発により,テキストから3次元コンテンツの自動生成が大きな進歩を遂げている。既存のテキスト・トゥ・3D法は、事前訓練された2D拡散モデルにより評価されるように、3D表現を最適化し、レンダリングされた画像が与えられたテキストと適切に一致することを保証する。それでも、2D画像と3Dアセットの間にはかなりの領域ギャップがあり、主にカメラ関連属性のバリエーションと前景オブジェクトの排他的存在に起因する。したがって、3次元表現を最適化するために直接2次元拡散モデルを用いることは、最適以下の結果をもたらす可能性がある。本稿では,テキストから3dへの合成とテキストから3dへの合成のギャップを効果的に橋渡しする,高品質なテキストから3dへのコンテンツ作成のための新しいアプローチであるx-dreamerを提案する。 X-Dreamerの主なコンポーネントは、カメラ誘導低ランク適応(CG-LoRA)とアテンションマスクアライメント(AMA)ロスの2つの革新的な設計である。 CG-LoRAは、トレーニング可能なパラメータにカメラ依存生成を用いることで、事前訓練された拡散モデルにカメラ情報を動的に組み込む。この統合により、生成された3Dアセットとカメラの視点とのアライメントが強化される。 ama損失は、3dオブジェクトのバイナリマスクを使用して事前訓練された拡散モデルの注意マップを誘導し、前景オブジェクトの作成を優先する。このモジュールは、モデルが正確で詳細な前景オブジェクトの生成に集中することを保証する。提案手法の有効性を,既存のテキスト・ツー・3D手法と比較して評価した。プロジェクトWebページ: https://xmu-xiaoma666.github.io/Projects/X-Dreamer/。

論文の概要: X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

関連論文リスト