Fugu-MT 論文翻訳(概要): SceneOrchestra: Efficient Agentic 3D Scene Synthesis via Full Tool-Call Trajectory Generation

論文の概要: SceneOrchestra: Efficient Agentic 3D Scene Synthesis via Full Tool-Call Trajectory Generation

arxiv url: http://arxiv.org/abs/2604.19907v1
Date: Tue, 21 Apr 2026 18:33:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:10.748279
Title: SceneOrchestra: Efficient Agentic 3D Scene Synthesis via Full Tool-Call Trajectory Generation
Title（参考訳）: SceneOrchestra: フルツールコール軌道生成による効率的なエージェント3次元シーン合成
Authors: Yun He, Kelin Yu, Matthias Zwicker,
Abstract要約: 3Dシーン作成のためのトレーニング可能なオーケストレーションフレームワークであるSceneOrchestraを提案する。 SceneOrchestraはオーケストレータと識別器で構成されており、2段階のトレーニング戦略で微調整します。本手法は,従来の作業に比べて実行時間を短縮しつつ,最先端のシーン品質を実現する。
参考スコア（独自算出の注目度）: 13.62882510697547
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent agentic frameworks for 3D scene synthesis have advanced realism and diversity by integrating heterogeneous generation and editing tools. These tools are organized into workflows orchestrated by an off-the-shelf LLM. Current approaches typically adopt an execute-review-reflect loop: at each step, the orchestrator executes a tool, renders intermediate results for review, and then decides on the tool and its parameters for the next step. However, this design has two key limitations. First, next-step tool selection and parameter configuration are driven by heuristic rules, which can lead to suboptimal execution flows, unnecessary tool invocations, degraded output quality, and increased runtime. Second, rendering and reviewing intermediate results after each step introduces additional latency. To address these issues, we propose SceneOrchestra, a trainable orchestration framework that optimizes the tool-call execution flow and eliminates the step-by-step review loop, improving both efficiency and output quality. SceneOrchestra consists of an orchestrator and a discriminator, which we fine-tune with a two-phase training strategy. In the first phase, the orchestrator learns context-aware tool selection and complete tool-call trajectory generation, while the discriminator is trained to assess the quality of full trajectories, enabling it to select the best trajectory from multiple candidates. In the second phase, we perform interleaved training, where the discriminator adapts to the orchestrator's evolving trajectory distribution and distills its discriminative capability back into the orchestrator. At inference, we only use the orchestrator to generate and execute full tool-call trajectories from instructions, without requiring the discriminator. Extensive experiments show that our method achieves state-of-the-art scene quality while reducing runtime compared to previous work.
Abstract（参考訳）: 最近の3次元シーン合成のためのエージェントフレームワークは、異種生成と編集ツールを統合することで、高度なリアリズムと多様性を実現している。これらのツールは、既製のLLMによってオーケストレーションされたワークフローに編成される。それぞれのステップでオーケストレータがツールを実行し、レビューの中間結果をレンダリングし、次に次のステップでツールとそのパラメータを決定する。しかし、この設計には2つの重要な制限がある。まず、次のステップのツール選択とパラメータ設定はヒューリスティックなルールによって駆動されるため、最適化された実行フロー、不要なツール呼び出し、出力品質の低下、ランタイムの増大につながる可能性がある。第2に、各ステップの後に中間結果のレンダリングとレビューが追加のレイテンシを導入している。これらの問題に対処するために、ツールコール実行フローを最適化し、ステップバイステップのレビューループを排除し、効率と出力品質の両方を改善したトレーニング可能なオーケストレーションフレームワークであるSceneOrchestraを提案する。 SceneOrchestraはオーケストレータと識別器で構成されており、2段階のトレーニング戦略で微調整します。第1段階では、オーケストレータは、コンテキスト認識ツールの選択と完全なツールコールトラジェクトリ生成を学習し、一方、判別器は、完全なトラジェクトリの品質を評価するために訓練され、複数の候補から最適なトラジェクトリを選択することができる。第2フェーズでは,識別器がオーケストレータの進行する軌道分布に適応し,その識別能力をオーケストレータに蒸留するインターリーブドトレーニングを行う。推論では、オーケストレータのみを使用して命令から完全なツール呼び出しトラジェクトリを生成し、実行します。大規模な実験により,本手法は過去の作業に比べて実行時間を短縮しつつ,最先端のシーン品質を実現することが示された。

論文の概要: SceneOrchestra: Efficient Agentic 3D Scene Synthesis via Full Tool-Call Trajectory Generation

関連論文リスト