Fugu-MT 論文翻訳(概要): SeMv-3D: Towards Semantic and Mutil-view Consistency simultaneously for General Text-to-3D Generation with Triplane Priors

論文の概要: SeMv-3D: Towards Semantic and Mutil-view Consistency simultaneously for General Text-to-3D Generation with Triplane Priors

arxiv url: http://arxiv.org/abs/2410.07658v1
Date: Thu, 10 Oct 2024 07:02:06 GMT
ステータス: 翻訳完了
システム内更新日: 2024-10-31 15:46:26.765821
Title: SeMv-3D: Towards Semantic and Mutil-view Consistency simultaneously for General Text-to-3D Generation with Triplane Priors
Title（参考訳）: SeMv-3D:Triplane プリミティブを用いた汎用テキスト・ツー・3D生成のためのセマンティック・ミューティビュー整合性の実現に向けて
Authors: Xiao Cai, Pengpeng Zeng, Lianli Gao, Junchen Zhu, Jiaxin Zhang, Sitong Su, Heng Tao Shen, Jingkuan Song,
Abstract要約: 汎用テキストから3d生成のための新しいフレームワークであるSeMv-3Dを提案する。 3次元の空間的特徴を持つ3次元平面先行学習を学習し、3次元の異なる視点間の整合性を維持する三次元平面先行学習器を提案する。また,3次元空間特徴とテキスト・セマンティクスとの整合性を保持するセマンティック・アラインメント・ビュー・シンセサイザーを設計する。
参考スコア（独自算出の注目度）: 115.66850201977887
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in generic 3D content generation from text prompts have been remarkable by fine-tuning text-to-image diffusion (T2I) models or employing these T2I models as priors to learn a general text-to-3D model. While fine-tuning-based methods ensure great alignment between text and generated views, i.e., semantic consistency, their ability to achieve multi-view consistency is hampered by the absence of 3D constraints, even in limited view. In contrast, prior-based methods focus on regressing 3D shapes with any view that maintains uniformity and coherence across views, i.e., multi-view consistency, but such approaches inevitably compromise visual-textual alignment, leading to a loss of semantic details in the generated objects. To achieve semantic and multi-view consistency simultaneously, we propose SeMv-3D, a novel framework for general text-to-3d generation. Specifically, we propose a Triplane Prior Learner (TPL) that learns triplane priors with 3D spatial features to maintain consistency among different views at the 3D level, e.g., geometry and texture. Moreover, we design a Semantic-aligned View Synthesizer (SVS) that preserves the alignment between 3D spatial features and textual semantics in latent space. In SVS, we devise a simple yet effective batch sampling and rendering strategy that can generate arbitrary views in a single feed-forward inference. Extensive experiments present our SeMv-3D's superiority over state-of-the-art performances with semantic and multi-view consistency in any view. Our code and more visual results are available at https://anonymous.4open.science/r/SeMv-3D-6425.
Abstract（参考訳）: テキストプロンプトからの汎用的な3Dコンテンツ生成の進歩は、微調整によるテキスト・ツー・イメージ拡散(T2I)モデルや、一般的なテキスト・トゥ・3Dモデルを学ぶためにこれらのT2Iモデルを先行として採用することで顕著である。微調整に基づく手法は、テキストと生成されたビュー、すなわち意味的な一貫性を保証するが、多視点一貫性を実現する能力は、限られたビューであっても3D制約が欠如していることによって妨げられる。対照的に、先行する手法は、ビュー間の一様性とコヒーレンスを維持するあらゆるビュー、すなわち複数ビューの一貫性を保ちながら、3次元の形状を回帰することに焦点を当てるが、このようなアプローチは必然的に視覚的・テクスチュアルなアライメントを損なうため、生成されたオブジェクトのセマンティックな詳細が失われる。セマンティックとマルチビューの一貫性を同時に達成するために,汎用テキスト・ツー・3d生成のための新しいフレームワークであるSeMv-3Dを提案する。具体的には、3次元空間的特徴を持つ3次元平面先行学習を学習し、3次元の異なる視点、例えば幾何学やテクスチャの整合性を維持する三次元平面先行学習システム(TPL)を提案する。さらに,3次元空間特徴とテキスト・セマンティック・セマンティック・セマンティック・ビュー・シンセサイザー(SVS)の設計を行った。 SVSでは、単一のフィードフォワード推論で任意のビューを生成することができる、単純で効果的なバッチサンプリングとレンダリング戦略を考案する。総合的な実験により、セムブ3Dは、セマンティックとマルチビューの整合性を持つ最先端の性能よりも優れていることが示された。私たちのコードとよりビジュアルな結果はhttps://anonymous.4open.science/r/SeMv-3D-6425で公開されています。

論文の概要: SeMv-3D: Towards Semantic and Mutil-view Consistency simultaneously for General Text-to-3D Generation with Triplane Priors

関連論文リスト