Fugu-MT 論文翻訳(概要): SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes

論文の概要: SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes

arxiv url: http://arxiv.org/abs/2606.01912v1
Date: Mon, 01 Jun 2026 08:48:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:31.629697
Title: SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes
Title（参考訳）: SMH-Bench:スマートホームにおける環境調和と行動のためのLCMエージェントのベンチマーク
Authors: Kuan Li, Shuo Zhang, Huacan Wang, Fangzhou Yu, Zecheng Sheng, Yi Gu, Weipeng Ming, Lei Xue, Chen Liu, Sen Hu, Ronghao Chen, Siyue Lin, Yuqing Hou, Xiaofeng Mou, Yi Xu,
Abstract要約: スマートホーム環境におけるLarge Language Models (LLM) の評価のための総合ベンチマークであるSMH-Benchを紹介する。 HomeEnvは、実行可能で検証可能なスマートホームシミュレータで、SMH-Benchは7つのカテゴリと22のきめ細かいサブカテゴリにまたがる1,100の高品質なタスクを含んでいる。
参考スコア（独自算出の注目度）: 21.9224048962238
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Smart homes are evolving toward complex state-dependent living environments, requiring Large Language Models (LLMs) to reason over user intent, preferences, and multi-device interactions. However, existing smart-home benchmarks often focus on static instruction-to-API mapping or limited simulations, failing to evaluate whether LLMs can reason, interact, and act reliably in realistic household scenarios. To address these limitations, we introduce SMH-Bench, a comprehensive benchmark for evaluating LLMs in smart-home environments. Built upon HomeEnv, an executable and verifiable smart-home simulator, SMH-Bench contains 1,100 high-quality tasks spanning 7 categories and 22 fine-grained subcategories. It further stratifies tasks across simple, medium and complex homes, ranging from small apartments to dense multi-room environments with 135 devices. Experiments show that although frontier LLMs achieve strong performance on explicit control and query tasks, they still exhibit significant weaknesses in automation task scheduling, ambiguity handling and personalized reasoning, especially as home complexity increases. We hope SMH-Bench will facilitate the development of more reliable, context-aware, and practically deployable smart-home agents.
Abstract（参考訳）: スマートホームは複雑な状態依存の生活環境へと進化しており、ユーザ意図や好み、マルチデバイスインタラクションを推論するためには、Large Language Models (LLM)が必要である。しかし、既存のスマートホームベンチマークでは静的な命令-APIマッピングや限定的なシミュレーションに重点を置いており、LLMが現実的な家庭シナリオで推論、対話、確実に動作できるかどうかを評価できないことが多い。これらの制約に対処するため,スマートホーム環境におけるLCM評価のための総合ベンチマークSMH-Benchを導入する。 HomeEnvは、実行可能で検証可能なスマートホームシミュレータで、SMH-Benchは7つのカテゴリと22のきめ細かいサブカテゴリにまたがる1,100の高品質なタスクを含んでいる。さらに、小さなアパートから135台のデバイスを備えた密集したマルチルーム環境まで、単純で中堅で複雑な住宅にまたがるタスクを階層化している。実験によると、フロンティアのLLMは明示的な制御とクエリタスクで強い性能を発揮するが、自動化タスクのスケジューリング、あいまいさ処理、パーソナライズされた推論において、特に家庭の複雑さが増大するにつれて、大きな弱点がある。 SMH-Benchがより信頼性が高く、コンテキスト対応で、実際にデプロイ可能なスマートホームエージェントの開発を促進することを願っている。

論文の概要: SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes

関連論文リスト