Fugu-MT 論文翻訳(概要): HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?

論文の概要: HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?

arxiv url: http://arxiv.org/abs/2605.30058v1
Date: Thu, 28 May 2026 15:08:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.416112
Title: HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?
Title（参考訳）: HEART-Bench: LLMエージェントは人間のような心理学を禁止しているのか?
Authors: Weihan Peng, Chenxu Zhang, Qianao Wang, Yuling Shi, Heng Lian, Qihong Mao, Jiahao Pang, Chunliang Feng, Bowen Li, Xiaodong Gu,
Abstract要約: 本稿では,LLMエージェントが人間のようなコヒーレントな心理をシミュレートできるかどうかを評価するための新しいベンチマークを提案する。我々のベンチマークでは、Big Fiveの性格特性を基盤とした11種類の人格文字が構成されており、各プロファイルは1,000個の自己書誌的エピソード記憶と深く統合されている。エージェントを様々なシナリオに従属させることで、彼らの固有の性格特性と自伝的記憶を統合して、行動決定を特定の心理的プロファイルと整合させることができるかどうかを評価する。
参考スコア（独自算出の注目度）: 25.237337617299946
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While LLM agents have demonstrated remarkable task-oriented abilities such as planning, reasoning, and action, few works have treated them as complete human personalities where emotional dimensions hold equal importance. In this paper, we introduce a novel benchmark to systematically assess whether LLM agents can simulate coherent, human-like psychology. Specifically, our benchmark constructs 11 diverse human characters grounded in orthogonal Big Five personality traits, with each profile deeply integrated with 1,000 structured autobiographical-style episodic memories distributed across theory-grounded developmental life stages. To rigorously evaluate the psychological manifestations of LLMs, we designed a curated suite of 64 decision-making scenarios, guided by the DIAMONDS taxonomy, a psychological framework that characterizes situations along eight dimensions: Duty, Intellect, Adversity, Mating, pOsitivity, Negativity, Deception, and Sociality. By subjecting agents to varying scenarios, the benchmark evaluates whether they can consolidate their innate personality traits and autobiographical memories to make behavioral decisions that are consistent with their specific psychological profiles. After systematic human validation and filtering, we obtained a benchmark consisting of 673 multiple-choice questions (MCQs). We believe this benchmark provides a principled and scalable testbed for studying human-like emotions, personality consistency, and value-consistent behavioural decision-making in LLM-based agents.
Abstract（参考訳）: LLMエージェントは、計画、推論、行動といった目覚ましいタスク指向の能力を示してきたが、感情的な次元が同じ重要性を持つ完全な人間的個性として扱った作品はほとんどない。本稿では,LLMエージェントがコヒーレントで人間らしい心理学をシミュレートできるかどうかを体系的に評価するための新しいベンチマークを提案する。具体的には、直交する5つの人格の特徴を基盤とした11種類の人格をベンチマークで構成し、各プロファイルは理論上の発達段階に分散した1000個の自伝的エピソード記憶と深く統合する。 LLMの心理的発現を厳格に評価するために,DiamonDS分類法(DIAMONDS taxonomy, DIAMONDS taxonomy, DIAMONDS taxonomy, DIAMONDS taxonomy, DIAMONDS taxonomy)によって導かれる64の意思決定シナリオのキュレートスイートを設計した。エージェントを様々なシナリオに従属させることにより、ベンチマークは、彼らの固有の性格特性と自伝的記憶を統合して、特定の心理学的プロファイルと整合した行動決定を行うことができるかどうかを評価する。体系的な検証とフィルタリングを行った結果,673質問(MCQ)からなるベンチマークが得られた。このベンチマークは、LLMベースのエージェントにおいて、人間のような感情、個性一貫性、および価値一貫性のある行動決定を研究するための、原則付きでスケーラブルなテストベッドを提供すると信じている。

論文の概要: HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?

関連論文リスト