Fugu-MT 論文翻訳(概要): TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

論文の概要: TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

arxiv url: http://arxiv.org/abs/2605.18109v1
Date: Mon, 18 May 2026 09:19:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:49.222095
Title: TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning
Title（参考訳）: TaskGround:フルシーンハウス推論のための構造化された実行可能なタスク推論
Authors: ZhiYuan Feng, Yu Deng, Ruichuan An, Zhenhua Liu, Qixiu Li, Keming Wu, Zhiying Du, Weijie Wang, Haoxiao Wang, Shuang Chen, Sicheng Xu, Yaobo Liang, Jiaolong Yang, Baining Guo,
Abstract要約: 実家配置においては、家庭エージェントは、しばしば完全な家庭シーンと特定の世帯要求から操作されなければならない。我々はこれをフルステージの家庭推論として定式化する。我々は、トレーニング不要でモデルに依存しないGround-Infer-ExecuteフレームワークであるTaskGroundを提案する。
参考スコア（独自算出の注目度）: 39.5104374425805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In real home deployments, household agents must often operate from a complete household scene and a situated household request, rather than from a clean task specification. Such requests require agents to identify task-relevant entities, recover intended task conditions, and resolve ordering constraints from the surrounding scene context. We formalize this capability as full-scene household reasoning: given a complete household scene and a situated household request, an agent must infer executable task structure before producing a grounded skill-level action sequence. This setting is challenging because complete household scenes contain substantial task-irrelevant information, making direct complete-scene prompting inefficient and error-prone. In practical deployment, this challenge is further amplified by privacy and local compute constraints, which favor compact open-weight models with limited long-context reasoning ability. We propose TaskGround, a training-free and model-agnostic Ground-Infer-Execute framework that grounds complete scenes into compact task-relevant scene slices, infers executable task structure, and compiles it into grounded skill-level action sequences. To evaluate this setting, we introduce FullHome, a human-validated evaluation suite of 400 household tasks spanning diverse home-scale environments and both goal-oriented and process-constrained requirements. On FullHome, TaskGround improves task success rates by large margins across both proprietary and open-weight models. Notably, it makes Qwen3.5-9B competitive with GPT-5 under direct complete-scene prompting while reducing total input-token cost by up to 18x. Our results identify executable task-structure inference as a central bottleneck in full-scene household reasoning and show that structured grounding can make compact local models substantially more effective for practical household deployment.
Abstract（参考訳）: 実家配置では、クリーンなタスク仕様からではなく、完全な家庭のシーンと位置した家庭の要求から、家庭のエージェントが操作されなければならない。このような要求は、エージェントがタスク関連エンティティを特定し、意図されたタスク条件を回復し、周囲のシーンコンテキストからの順序付け制約を解決する必要がある。我々は,この能力をフルシーンの家庭的推論として定式化し,完全な家庭的シーンと所定の世帯的要求を与えられたエージェントは,現地のスキルレベルのアクションシーケンスを生成する前に実行可能なタスク構造を推論しなければならない。この設定は、フルホームシーンにはタスク非関連情報が含まれており、直接完全シーンが非効率でエラーを起こしやすいため、難しい。現実的な展開では、この課題はプライバシーと局所的な計算制約によってさらに増幅され、長文推論能力に制限のあるコンパクトなオープンウェイトモデルが好まれる。本研究では,学習不要かつモデル非依存な地上・現場・活動型フレームワークであるTaskGroundを提案し,シーン全体をコンパクトなタスク関連シーンスライスにグラウンド化し,実行可能タスク構造を推測し,それをグラウンド化されたスキルレベルのアクションシーケンスにコンパイルする。この設定を評価するために、FullHomeは、多様なホームスケール環境と目標指向およびプロセス制約のある要件にまたがる400の家庭用タスクを対象とした人為的な評価スイートである。 FullHomeでは、TaskGroundはプロプライエタリモデルとオープンウェイトモデルの両方でタスク成功率を大幅に改善する。特に、Qwen3.5-9BはGPT-5と直接完全シーンのプロンプトで競合し、総入力トーケンコストを最大18倍に削減している。本研究は, 実運用型住宅配置において, 実運用型住宅配置モデルにおいて, 実運用型住宅配置モデルにおいて, 実運用型住宅配置モデルにおいて, 実運用型住宅配置モデルが実質的に有効であることを示すものである。

論文の概要: TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

関連論文リスト