Fugu-MT 論文翻訳(概要): Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

論文の概要: Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

arxiv url: http://arxiv.org/abs/2603.17583v1
Date: Wed, 18 Mar 2026 10:46:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.654727
Title: Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing
Title（参考訳）: エディット・アズ・アクト:オープン・ボキャブラリ3次元屋内シーン編集のためのゴール・レグレッシブ・プランニング
Authors: Seongrae Noh, SeungWon Seo, Gyeong-Moon Park, HyeongYeop Kang,
Abstract要約: Edit-As-Actは3D空間における目標回帰計画としてオープン語彙シーン編集を行うフレームワークである。言語駆動のプランナーが行動を提案し、バリケータがゴール指向性、単調性、身体的実現性を強制する。 E2A-Benchでは,9つの屋内環境を対象とした63の編集タスクのベンチマークを行った。
参考スコア（独自算出の注目度）: 20.022591860394012
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Editing a 3D indoor scene from natural language is conceptually straightforward but technically challenging. Existing open-vocabulary systems often regenerate large portions of a scene or rely on image-space edits that disrupt spatial structure, resulting in unintended global changes or physically inconsistent layouts. These limitations stem from treating editing primarily as a generative task. We take a different view. A user instruction defines a desired world state, and editing should be the minimal sequence of actions that makes this state true while preserving everything else. This perspective motivates Edit-As-Act, a framework that performs open-vocabulary scene editing as goal-regressive planning in 3D space. Given a source scene and free-form instruction, Edit-As-Act predicts symbolic goal predicates and plans in EditLang, a PDDL-inspired action language that we design with explicit preconditions and effects encoding support, contact, collision, and other geometric relations. A language-driven planner proposes actions, and a validator enforces goal-directedness, monotonicity, and physical feasibility, producing interpretable and physically coherent transformations. By separating reasoning from low-level generation, Edit-As-Act achieves instruction fidelity, semantic consistency, and physical plausibility - three criteria that existing paradigms cannot satisfy together. On E2A-Bench, our benchmark of 63 editing tasks across 9 indoor environments, Edit-As-Act significantly outperforms prior approaches across all edit types and scene categories.
Abstract（参考訳）: 自然言語から3D屋内シーンを編集するのは、概念的には単純だが技術的には難しい。既存のオープン語彙システムは、しばしばシーンの大部分を再生するか、空間構造を乱す画像空間の編集に依存し、意図しないグローバルな変化や物理的に一貫性のないレイアウトをもたらす。これらの制限は、編集を主に生成タスクとして扱うことに由来する。私たちは別の見方を取る。ユーザ命令は、望ましい世界状態を定義し、編集は、他のすべてを保存しながら、この状態を真にする最小のアクションシーケンスであるべきである。この視点は、3D空間における目標回帰計画としてオープン語彙シーン編集を実行するフレームワークであるEdit-As-Actを動機付けている。ソースシーンと自由形式の命令が与えられた後、Edit-As-ActはPDDLにインスパイアされた行動言語であるEditLangでシンボル目標の述語と計画を予測する。言語駆動型プランナーが行動を提案し、バリケータが目標指向性、単調性、物理的実現性を適用し、解釈可能かつ物理的に一貫性のある変換を生成する。低レベルの生成から推論を分離することで、Edit-As-Actは命令の忠実さ、セマンティックな一貫性、物理的妥当性を達成する。 E2A-Benchでは、9つの屋内環境にわたる63の編集タスクのベンチマークを行い、編集-As-Actは、すべての編集タイプやシーンカテゴリで以前のアプローチよりも大幅に優れています。

論文の概要: Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

関連論文リスト