Fugu-MT 論文翻訳(概要): HumanLM: Simulating Users with State Alignment Beats Response Imitation

論文の概要: HumanLM: Simulating Users with State Alignment Beats Response Imitation

arxiv url: http://arxiv.org/abs/2603.03303v1
Date: Sat, 07 Feb 2026 20:26:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:08.129397
Title: HumanLM: Simulating Users with State Alignment Beats Response Imitation
Title（参考訳）: HumanLM: 状態アライメントでユーザをシミュレートする
Authors: Shirley Wu, Evelyn Choi, Arpandeep Khatua, Zhanghan Wang, Joy He-Yueya, Tharindu Cyril Weerasooriya, Wei Wei, Diyi Yang, Jure Leskovec, James Zou,
Abstract要約: 本稿では,実際のユーザを正確に反映したユーザシミュレータを構築する新しいトレーニングフレームワークHumanLMを提案する。 HumanLMは、強化学習を通じて、地道的な応答に一致した自然言語の潜伏状態を生成する。本研究では,公開データに基づく実ユーザシミュレーションのための総合的なベンチマークであるHumanualを開発する。
参考スコア（独自算出の注目度）: 84.89761487596844
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly used to simulate how specific users respond to a given context, enabling more user-centric applications that rely on user feedback. However, existing user simulators mostly imitate surface-level patterns and language styles, which fail to reflect the underlying states of real users (e.g., beliefs and emotions). To address these limitations, we propose a novel training framework, HumanLM, which builds user simulators that accurately reflect real users. Our key insight is that, in addition to generating responses, the model should generate natural-language latent states that align with ground-truth responses through reinforcement learning. These latent states correspond to a set of psychologically grounded state dimensions that drive how real users respond. HumanLM further synthesizes these aligned latent states into responses that accurately represent real users. For extensive evaluation, we develop Humanual, a comprehensive benchmark for simulating real users based on public data. Humanual consists of six large-scale datasets with 26k users and 216k responses in total, spanning diverse tasks such as generating user responses to daily life issues, political blogs, and chat sessions with LLM assistants. Across datasets, HumanLM significantly outperforms alternative approaches, achieving an average relative improvement of 16.3% in alignment scores from an LLM judge. In a real-time simulation study with 111 participants, HumanLM achieves the highest similarity to real user responses and competitive human-likeness scores.
Abstract（参考訳）: 大きな言語モデル(LLM)は、特定のユーザが特定のコンテキストにどのように反応するかをシミュレートするために、ますます使われています。しかし、既存のユーザシミュレータは、主に表面レベルのパターンや言語スタイルを模倣しており、実際のユーザ(信念や感情など)の根底にある状態を反映できない。これらの制約に対処するために,実ユーザを正確に反映したユーザシミュレータを構築する新しいトレーニングフレームワークHumanLMを提案する。我々の重要な洞察は、モデルが応答を生成することに加えて、強化学習を通して、地道的な反応と整合する自然言語の潜伏状態を生成することである。これらの潜伏状態は、実際のユーザの反応を駆動する心理学的根拠のある状態次元のセットに対応する。 HumanLMはさらに、これらの整列した潜在状態を、実際のユーザを正確に表現する応答に合成する。本研究では,公開データに基づく実ユーザシミュレーションのための総合的なベンチマークであるHumanualを開発する。 Humanualは6つの大規模なデータセットで構成され、合計で26kユーザ、216kレスポンスがあり、日々の生活問題に対するユーザ応答の生成、政治ブログ、LLMアシスタントとのチャットセッションなど、さまざまなタスクにまたがっている。データセット全体では、HumanLMは代替手法よりも大幅に優れており、LCM判事のアライメントスコアの平均は16.3%向上している。 111人の参加者によるリアルタイムシミュレーション研究において、HumanLMは実際のユーザ反応と競合する人間類似度スコアに最もよく似ている。

論文の概要: HumanLM: Simulating Users with State Alignment Beats Response Imitation

関連論文リスト