Fugu-MT 論文翻訳(概要): Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

論文の概要: Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

arxiv url: http://arxiv.org/abs/2605.18451v1
Date: Mon, 18 May 2026 14:18:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:49.71189
Title: Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis
Title（参考訳）: Code-as-Room:エージェントコード合成によるトップダウン画像から3Dルームを生成する
Authors: Yixuan Yang, Zhen Luo, Wanshui Gan, Jinkun Hao, Junru Lu, Jinghao Yan, Zhaoyang Lyu, Xudong Xu,
Abstract要約: 構造化実行ハーネスを備えたMLLMベースのエージェントフレームワークであるCode-as-Roomを提案する。トップダウンのルームイメージが与えられた場合、フレームワークは参照画像を解析してシーン要素とその空間関係を抽出する。クロスステージメモリモジュールは、既存のエージェントベースのフレームワーク固有のコンテキストを緩和するために、全期間にわたって維持される。
参考スコア（独自算出の注目度）: 12.68633443613779
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, text-based methods struggle to capture precise spatial information, and existing image-conditioned agents suffer from instability and infinite looping when tasked with holistic room generation from top-down views. To address these limitations, we propose Code-as-Room, an MLLM-based agentic framework equipped with a structured execution harness, which represents 3D rooms with Blender codes. Given a top-down room image, the framework parses the reference image to extract scene elements and their spatial relationships, and synthesizes executable Blender code for geometry, materials, and lighting in a principled, multi-stage pipeline. A cross-stage memory module is maintained throughout to mitigate context forgetting inherent to existing agent-based frameworks. We further introduce a dedicated benchmark for code-based 3D room synthesis, encompassing various evaluation protocols. Based on our benchmark, comprehensive comparisons against existing agent-based methods are conducted to validate the effectiveness of our proposed execution harness.
Abstract（参考訳）: リアルで機能的な3D屋内ルームを設計することは、インテリアデザイン、バーチャルリアリティー、ゲーム、エンボディドAIなど、幅広いアプリケーションに必須である。最近のMLLMベースのアプローチは、テキスト記述や参照画像から3次元の部屋の合成に大きな可能性を示しているが、テキストベースの手法は正確な空間情報を捉えるのに苦労しており、既存の画像条件付きエージェントは、トップダウンビューから全体論的な部屋の生成をタスクした場合、不安定性と無限ループに悩まされている。これらの制約に対処するため,MLLMベースのエージェントフレームワークであるCode-as-Roomを提案する。トップダウンのルームイメージを与えられたフレームワークは、参照画像を解析してシーン要素とその空間的関係を抽出し、原則化されたマルチステージパイプラインで、幾何学、材料、照明のための実行可能なブレンダーコードを合成する。クロスステージメモリモジュールは、既存のエージェントベースのフレームワーク固有のコンテキストを忘れないように、全期間維持される。さらに、様々な評価プロトコルを含むコードベースの3D部屋合成のための専用ベンチマークを導入する。本ベンチマークに基づいて,提案手法の有効性を検証するため,既存のエージェントベース手法と比較して総合的な比較を行った。

論文の概要: Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

関連論文リスト