Fugu-MT 論文翻訳(概要): Native3D: End-to-End 3D Scene Generation via Unified Mesh-Texture Modeling and Semantic Alignment

論文の概要: Native3D: End-to-End 3D Scene Generation via Unified Mesh-Texture Modeling and Semantic Alignment

arxiv url: http://arxiv.org/abs/2606.07117v1
Date: Fri, 05 Jun 2026 10:13:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-08 14:33:29.686868
Title: Native3D: End-to-End 3D Scene Generation via Unified Mesh-Texture Modeling and Semantic Alignment
Title（参考訳）: Native3D: 統一メッシュテクスチャモデリングとセマンティックアライメントによるエンドツーエンド3Dシーン生成
Authors: Yibo Liu, Ziwei Zhang, Haozhou Pang, Menghao Li, Lanshan He, Gan Qi,
Abstract要約: Native3Dは、2D中間表現を完全にバイパスする最初のエンドツーエンドの3Dシーン生成フレームワークである。本稿では,トランスフォーマーを用いたシーンエンコーダを用いて,幾何学的構造とテクスチャの特徴を同時にモデル化したメッシュ・テクスチャ結合表現を提案する。
参考スコア（独自算出の注目度）: 8.98466538269363
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents Native3D, the first end-to-end 3D scene generation framework that completely bypasses 2D intermediate representations. Traditional approaches typically require adapting 3D representations to the 2D domain to leverage pre-trained diffusion models, which inevitably introduces domain adaptation issues including geometric structural distortion and texture detail degradation. To address these limitations, we design a unified mesh-texture joint representation that simultaneously models both geometric structures and texture features through a Transformer-based scene encoder, effectively maintaining spatial relationships and visual consistency among objects within scenes. We further propose the 3D Representation Alignment Loss (3D REPA Loss), which employs an improved contrastive learning mechanism to align multi-level semantic representations in the latent space, significantly enhancing geometric and textural fidelity. Experimental results demonstrate that Native3D outperforms existing methods in both generation quality and editing flexibility, providing a novel solution for 3D scene editing.
Abstract（参考訳）: 本稿では,2次元中間表現を完全にバイパスする最初のエンドツーエンド3Dシーン生成フレームワークであるNative3Dについて述べる。従来のアプローチでは、事前に訓練された拡散モデルを活用するために3次元表現を2D領域に適用する必要があるが、これは必然的に幾何学的構造歪みやテクスチャディテールの劣化を含む領域適応の問題をもたらす。これらの制約に対処するために、トランスフォーマーベースのシーンエンコーダを用いて、幾何学的構造とテクスチャの特徴を同時にモデル化し、シーン内のオブジェクト間の空間的関係と視覚的一貫性を効果的に維持する統合メッシュ・テクスチャ共同表現を設計する。さらに3D Representation Alignment Loss(3D REPA Loss)を提案する。この3D Representation Alignment Loss(3D REPA Loss)は、改良されたコントラスト学習機構を用いて、潜在空間における多レベル意味表現を調整し、幾何学的およびテクスチャ的忠実度を著しく向上する。実験の結果,Native3Dは生成品質と編集の柔軟性の両方において既存の手法よりも優れており,新たな3Dシーン編集ソリューションを提供することがわかった。

論文の概要: Native3D: End-to-End 3D Scene Generation via Unified Mesh-Texture Modeling and Semantic Alignment

関連論文リスト