Fugu-MT 論文翻訳(概要): LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health

論文の概要: LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health

arxiv url: http://arxiv.org/abs/2601.13880v1
Date: Tue, 20 Jan 2026 11:51:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-21 22:47:23.292671
Title: LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health
Title（参考訳）: LifeAgentBench:デジタルヘルスにおけるパーソナルヘルスアシスタントのための多次元ベンチマークおよびエージェント
Authors: Ye Tian, Zihao Wang, Onat Gungor, Xiaoran Fan, Tajana Rosing,
Abstract要約: LifeAgentBenchは、長期、クロス次元、マルチユーザのライフスタイル推論のための大規模なQAベンチマークである。本稿では,多段階のエビデンス検索と決定論的アグリゲーションを統合した保健アシスタントのための強力なベースラインエージェントとしてLifeAgentを提案する。
参考スコア（独自算出の注目度）: 27.162708805711706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized digital health support requires long-horizon, cross-dimensional reasoning over heterogeneous lifestyle signals, and recent advances in mobile sensing and large language models (LLMs) make such support increasingly feasible. However, the capabilities of current LLMs in this setting remain unclear due to the lack of systematic benchmarks. In this paper, we introduce LifeAgentBench, a large-scale QA benchmark for long-horizon, cross-dimensional, and multi-user lifestyle health reasoning, containing 22,573 questions spanning from basic retrieval to complex reasoning. We release an extensible benchmark construction pipeline and a standardized evaluation protocol to enable reliable and scalable assessment of LLM-based health assistants. We then systematically evaluate 11 leading LLMs on LifeAgentBench and identify key bottlenecks in long-horizon aggregation and cross-dimensional reasoning. Motivated by these findings, we propose LifeAgent as a strong baseline agent for health assistant that integrates multi-step evidence retrieval with deterministic aggregation, achieving significant improvements compared with two widely used baselines. Case studies further demonstrate its potential in realistic daily-life scenarios. The benchmark is publicly available at https://anonymous.4open.science/r/LifeAgentBench-CE7B.
Abstract（参考訳）: パーソナライズされたデジタルヘルスサポートには、不均一なライフスタイル信号に対する長期的、多次元的推論が必要であり、モバイルセンシングと大規模言語モデル(LLM)の最近の進歩により、このようなサポートはますます実現可能になっている。しかし、この環境での現在のLLMの能力は、体系的なベンチマークが欠如しているため、いまだに不明である。本稿では,大規模QAベンチマークであるLifeAgentBenchについて紹介する。拡張性のあるベンチマーク構築パイプラインと標準化された評価プロトコルをリリースし、LCMベースのヘルスアシスタントの信頼性とスケーラブルな評価を可能にする。次に,LifeAgentBench上での11個のLLMを系統的に評価し,長軸アグリゲーションとクロス次元推論における重要なボトルネックを同定した。本研究の目的は,多段階的エビデンス検索と決定論的アグリゲーションを統合した医療アシスタントの強力なベースラインエージェントとしてLifeAgentを提案することである。事例研究は、現実的な日常生活シナリオにおけるその可能性をさらに示している。ベンチマークはhttps://anonymous.4open.science/r/LifeAgentBench-CE7Bで公開されている。

論文の概要: LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health

関連論文リスト