Fugu-MT 論文翻訳(概要): ChildEval: When large language models meet children's personalities

論文の概要: ChildEval: When large language models meet children's personalities

arxiv url: http://arxiv.org/abs/2605.27805v1
Date: Wed, 27 May 2026 00:53:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.65825
Title: ChildEval: When large language models meet children's personalities
Title（参考訳）: ChildEval: 大きな言語モデルが子供の個性に合うとき
Authors: Yanyan Luo, Xue Han, Chunxu Zhao, Ruiqiao Bai, Yaxing Zhang, Qian Hu, Lijun Mei, Junlan Feng,
Abstract要約: 本研究では,LLMの長文会話における子ども中心の嗜好を推論し,フォローする能力を評価するためのベンチマークであるChildEvalを紹介する。 ChildEvalには3-6歳児の29Kのペルソナプロファイルが含まれており、比較的静的な背景情報を提供している。我々は,オープンソースのLCMを体系的に評価する,きめ細かな子中心評価プロトコルを提案する。
参考スコア（独自算出の注目度）: 27.061986552073012
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: While LLMs enable personalized chatbots, their effectiveness in child-centered personalization remains unclear, as systematic evaluation of child-specific preferences is still lacking. To address this gap, we introduce ChildEval, a benchmark for evaluating LLMs' ability to infer and follow child-centered preferences in long-context conversations. ChildEval contains 29K synthesized persona profiles of children aged 3-6, providing relatively static background information. Each persona is associated with a child preference-which may align with, conflict with, or be independent of the persona-expressed either explicitly in a single sentence or implicitly through 6-10 turn dialogues. Explicit and implicit preferences are designed to reflect the same underlying preference but differ in expression, capturing dynamic aspects of preference expression rather than changes in the static persona. The benchmark spans five top-level and fourteen sub-level categories covering children's daily lives and development. We further propose fine-grained, child-centric evaluation protocols to systematically assess open-source LLMs. Experimental results demonstrate how different personalized representations affect LLM responses and suggest that finetuning on ChildEval can enhance child-centered performance. Our code and dataset are available at https://github.com/ziyanluo/ChildEval.
Abstract（参考訳）: LLMは、パーソナライズされたチャットボットを可能にするが、子供固有の好みの体系的評価がまだ不足しているため、子供中心のパーソナライズにおけるそれらの効果は依然として不明である。このギャップに対処するために、LLMの長文会話における子中心の嗜好を推測・追跡する能力を評価するためのベンチマークであるChildEvalを紹介する。 ChildEvalには3-6歳児の29Kのペルソナプロファイルが含まれており、比較的静的な背景情報を提供している。それぞれのペルソナは子どもの好みに関連付けられており、一文で明示的に表現されたペルソナと一致し、矛盾し、あるいは独立している可能性がある。明示的および暗黙的嗜好は、同じ基本的嗜好を反映するように設計されているが、表現が異なるため、静的ペルソナの変化よりも、好み表現の動的な側面を捉えている。このベンチマークは、子供の日常生活と発達をカバーする5つのトップレベルと14のサブレベルカテゴリにまたがる。さらに,オープンソースのLCMを体系的に評価する,きめ細かな子中心評価プロトコルを提案する。実験結果から,パーソナライズされた表現がLLMの反応にどのように影響するかが示され,チャイルドエスバルの微調整が子中心のパフォーマンスを向上させることが示唆された。私たちのコードとデータセットはhttps://github.com/ziyanluo/ChildEval.orgで公開されています。

論文の概要: ChildEval: When large language models meet children's personalities

関連論文リスト