Fugu-MT 論文翻訳(概要): WorldMesh: Generating Navigable Multi-Room 3D Scenes via Mesh-Conditioned Image Diffusion

論文の概要: WorldMesh: Generating Navigable Multi-Room 3D Scenes via Mesh-Conditioned Image Diffusion

arxiv url: http://arxiv.org/abs/2603.22972v1
Date: Tue, 24 Mar 2026 09:10:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.396127
Title: WorldMesh: Generating Navigable Multi-Room 3D Scenes via Mesh-Conditioned Image Diffusion
Title（参考訳）: WorldMesh:メッシュによる画像拡散によるナビゲート可能なマルチルーム3Dシーンの生成
Authors: Manuel-Andreas Schneider, Angela Dai,
Abstract要約: テキスト・ツー・イメージとビデオのアプローチは、明示的な幾何学が欠如しているため、限られた環境スケールを超えてシーンレベルの一貫性とオブジェクトレベルの一貫性を維持するのに苦労する。本稿では,大規模な3次元シーン合成の複雑な問題を構造合成に分解する幾何学的手法を提案する。これにより、スケーラブルで任意の大きさのオブジェクトのリッチさと多様性の3Dシーンが実現され、堅牢な3D一貫性とフォトリアリスティックなディテールが組み合わさる。
参考スコア（独自算出の注目度）: 39.78606573330677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent progress in image and video synthesis has inspired their use in advancing 3D scene generation. However, we observe that text-to-image and -video approaches struggle to maintain scene- and object-level consistency beyond a limited environment scale due to the absence of explicit geometry. We thus present a geometry-first approach that decouples this complex problem of large-scale 3D scene synthesis into its structural composition, represented as a mesh scaffold, and realistic appearance synthesis, which leverages powerful image synthesis models conditioned on the mesh scaffold. From an input text description, we first construct a mesh capturing the environment's geometry (walls, floors, etc.), and then use image synthesis, segmentation and object reconstruction to populate the mesh structure with objects in realistic layouts. This mesh scaffold is then rendered to condition image synthesis, providing a structural backbone for consistent appearance generation. This enables scalable, arbitrarily-sized 3D scenes of high object richness and diversity, combining robust 3D consistency with photorealistic detail. We believe this marks a significant step toward generating truly environment-scale, immersive 3D worlds.
Abstract（参考訳）: 映像と映像の合成の最近の進歩は、3Dシーン生成の進展にインスピレーションを与えている。しかし,テキスト・トゥ・イメージとビデオ・アプローチは,明示的な幾何学が欠如しているため,限られた環境スケールを超えてシーンレベルの一貫性やオブジェクトレベルの一貫性を維持するのに苦慮している。そこで我々は,大規模な3次元シーン合成の複雑な問題を,メッシュの足場として表現された構造構成と,メッシュの足場に条件付けされた強力な画像合成モデルを活用したリアルな外観合成に分解する幾何学的手法を提案する。入力テキストの記述から、まず環境の幾何学(壁、床など)を捉えたメッシュを構築し、画像合成、セグメンテーション、オブジェクト再構成を用いて、メッシュ構造をリアルなレイアウトで表現する。このメッシュ足場は条件画像合成にレンダリングされ、一貫した外観生成のための構造的バックボーンを提供する。これにより、スケーラブルで任意の大きさのオブジェクトのリッチさと多様性の3Dシーンが実現され、堅牢な3D一貫性とフォトリアリスティックなディテールが組み合わさる。これは、真の環境スケールで没入型3D世界を生み出すための重要なステップだと考えています。

論文の概要: WorldMesh: Generating Navigable Multi-Room 3D Scenes via Mesh-Conditioned Image Diffusion

関連論文リスト