Fugu-MT 論文翻訳(概要): SDesc3D: Towards Layout-Aware 3D Indoor Scene Generation from Short Descriptions

論文の概要: SDesc3D: Towards Layout-Aware 3D Indoor Scene Generation from Short Descriptions

arxiv url: http://arxiv.org/abs/2604.01972v3
Date: Wed, 08 Apr 2026 02:52:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 14:06:04.974344
Title: SDesc3D: Towards Layout-Aware 3D Indoor Scene Generation from Short Descriptions
Title（参考訳）: SDesc3D:短い記述によるレイアウト対応3D屋内シーン生成に向けて
Authors: Jie Feng, Jiawei Shen, Junjia Huang, Junpeng Zhang, Mingtao Feng, Weisheng Dong, Guanbin Li,
Abstract要約: 短いテキスト記述を前提とした室内3次元シーン生成は,インタラクティブな3次元環境構築に有望な道を提供する。既存の作品は、そのような意味的凝縮の場合、身体的妥当性の低下と詳細性の不足に悩まされている。 SDesc3Dは,マルチビュー構造と地域機能を考慮した,短文条件付き屋内シーン生成フレームワークである。
参考スコア（独自算出の注目度）: 71.54559024212976
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 3D indoor scene generation conditioned on short textual descriptions provides a promising avenue for interactive 3D environment construction without the need for labor-intensive layout specification. Despite recent progress in text-conditioned 3D scene generation, existing works suffer from poor physical plausibility and insufficient detail richness in such semantic condensation cases, largely due to their reliance on explicit semantic cues about compositional objects and their spatial relationships. This limitation highlights the need for enhanced 3D reasoning capabilities, particularly in terms of prior integration and spatial anchoring. Motivated by this, we propose SDesc3D, a short-text conditioned 3D indoor scene generation framework, that leverages multi-view structural priors and regional functionality implications to enable 3D layout reasoning under sparse textual guidance. Specifically, we introduce a Multi-view scene prior augmentation that enriches underspecified textual inputs with aggregated multi-view structural knowledge, shifting from inaccessible semantic relation cues to multi-view relational prior aggregation. Building on this, we design a Functionality-aware layout grounding, employing regional functionality grounding for implicit spatial anchors and conducting hierarchical layout reasoning to enhance scene organization and semantic plausibility. Furthermore, an Iterative reflection-rectification scheme is employed for progressive structural plausibility refinement via self-rectification. Extensive experiments show that our method outperforms existing approaches on short-text conditioned 3D indoor scene generation. Code will be publicly available.
Abstract（参考訳）: 短時間のテキスト記述を前提とした室内3次元シーン生成は、労働集約的なレイアウト仕様を必要とせず、インタラクティブな3次元環境構築に有望な道を提供する。近年のテキスト・コンディショニング3Dシーン生成の進展にもかかわらず、既存の作品は、構成対象とその空間的関係に関する明示的な意味的手がかりに大きく依存していることから、物理的妥当性の低下と、そのような意味的凝縮のケースにおける詳細性の不足に悩まされている。この制限は、特に事前の統合と空間アンカーの観点から、強化された3D推論機能の必要性を強調している。そこで本研究では,多視点構造を前提とした3次元屋内シーン生成フレームワークであるSDesc3Dを提案する。具体的には、多視点構造知識を集約した不特定テキスト入力を、アクセシブルなセマンティックな関係からマルチビューのリレーショナルな事前アグリゲーションへとシフトさせるマルチビュー事前アグリゲーションを導入する。そこで我々は,暗黙の空間アンカーを基盤とした機能的レイアウトグラウンドを設計し,階層的なレイアウト推論を行い,シーン構造とセマンティックな妥当性を高める。さらに、自己修正による進行的構造的可視性改善のために、反復的反射補正方式を用いる。広汎な実験により,本手法は3次元屋内シーン生成における既存手法よりも優れていた。コードは公開されます。

論文の概要: SDesc3D: Towards Layout-Aware 3D Indoor Scene Generation from Short Descriptions

関連論文リスト