Fugu-MT 論文翻訳(概要): Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

論文の概要: Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

arxiv url: http://arxiv.org/abs/2604.08362v1
Date: Thu, 09 Apr 2026 15:26:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.994136
Title: Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
Title（参考訳）: 実世界の人間行動シミュレーションに向けて--長距離・クロスシナリオ・異種行動トレースに基づく大規模言語モデルのベンチマーク
Authors: Jiawei Chen, Ruoxi Xu, Boxi Cao, Ruotong Pan, Yunfei Zhang, Yifei Hu, Yong Du, Tingting Gao, Yaojie Lu, Yingfei Sun, Xianpei Han, Le Sun, Xiangyu Wu, Hongyu Lin,
Abstract要約: 我々はOmniBehaviorを紹介した。OmniBehaviorは実世界のデータから構築された最初のユーザシミュレーションベンチマークである。現在のモデルでは,コンテキストウィンドウが拡大しても,複雑な振る舞いを正確にシミュレートすることが困難であることを示す。この結果、個人差や長い尾の挙動が失われ、将来の高忠実度シミュレーション研究における重要な方向性が浮き彫りになる。
参考スコア（独自算出の注目度）: 81.41397370235102
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulator. However, existing benchmarks remain constrained to isolated scenarios, narrow action spaces, or synthetic data, failing to capture the holistic nature of authentic human behavior. To bridge this gap, we introduce OmniBehavior, the first user simulation benchmark constructed entirely from real-world data, integrating long-horizon, cross-scenario, and heterogeneous behavioral patterns into a unified framework. Based on this benchmark, we first provide empirical evidence that previous datasets with isolated scenarios suffer from tunnel vision, whereas real-world decision-making relies on long-term, cross-scenario causal chains. Extensive evaluations of state-of-the-art LLMs reveal that current models struggle to accurately simulate these complex behaviors, with performance plateauing even as context windows expand. Crucially, a systematic comparison between simulated and authentic behaviors uncovers a fundamental structural bias: LLMs tend to converge toward a positive average person, exhibiting hyper-activity, persona homogenization, and a Utopian bias. This results in the loss of individual differences and long-tail behaviors, highlighting critical directions for future high-fidelity simulation research.
Abstract（参考訳）: LLM(Large Language Models)の出現は、汎用ユーザシミュレータの可能性を浮き彫りにした。しかし、既存のベンチマークは、孤立したシナリオ、狭いアクション空間、あるいは合成データに制約され続けており、真の人間の行動の全体的性質を捉えていない。このギャップを埋めるために、私たちはOmniBehaviorを紹介します。OmniBehaviorは、現実世界のデータから完全に構築された最初のユーザーシミュレーションベンチマークで、長い水平、クロスシナリオ、ヘテロジニアスな振る舞いパターンを統一されたフレームワークに統合します。このベンチマークに基づいて、我々はまず、孤立したシナリオを持つ以前のデータセットがトンネルビジョンに悩まされているという実証的な証拠を提示する。最先端のLCMの広範囲な評価により、現在のモデルでは、コンテキストウィンドウが拡大しても、パフォーマンスの平坦化とともに、これらの複雑な振る舞いを正確にシミュレートするのに苦労していることが明らかとなった。 LLMは肯定的な平均的な人に向かって収束し、超活動性、ペルソナ均質化、ユートピアバイアスを示す傾向がある。この結果、個人差や長い尾の挙動が失われ、将来の高忠実度シミュレーション研究における重要な方向性が浮き彫りになる。

論文の概要: Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

関連論文リスト