Fugu-MT 論文翻訳(概要): Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

論文の概要: Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

arxiv url: http://arxiv.org/abs/2606.20205v1
Date: Thu, 18 Jun 2026 13:18:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-19 18:23:39.871556
Title: Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact
Title（参考訳）: 大規模言語モデルの明らかな心理学的プロファイルは、大半が測定人工物である
Authors: Jelena Meyer, David Garcia, Dirk U. Wulff,
Abstract要約: 人間のために設計された心理学機器は、大きな言語モデル(LLM)の安定した心理学的プロファイルを割り当てるのにますます使われている。フォーマルな心理測定フレームワークを用いて、これらのプロファイルは主として測定成果物であることを示す。
参考スコア（独自算出の注目度）: 1.3277433989254843
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Psychological instruments designed for humans are increasingly used to assign large language models (LLMs) stable psychological profiles that affect their usability, safety assessment, and use as proxies for human participants in research. Using a formal psychometric framework, we show that these profiles are largely a measurement artifact. Administering a battery of personality and risk-preference instruments spanning self-reports and behavioral tasks to 56 instruction-tuned LLMs alongside large human reference samples, we report four findings. First, differences between models are driven not by the traits an instrument targets but by a directional response bias, a tendency to respond toward one end of the scale, or one labeled option, regardless of item content; a variance decomposition attributes 81-90% of between-model variation to this bias, against 9-16% in humans. Second, the bias declines with model capability but is not eliminated by it. Third, because bias rather than trait drives responding, an instrument's apparent reliability is almost entirely predicted by its response orthogonality, a term we coin for the proportion of items for which trait and bias point in opposite directions. Fourth, the profile a model appears to have shifts with the items used and can be manufactured through item selection. These results demonstrate that the apparent psychological profiles of LLMs are artifacts of the instrument used to measure them, not properties of the models themselves. As instruments borrowed from human psychology are rarely fully orthogonal and may inherently lack validity for LLMs, we call for dedicated assessments centered on response orthogonality.
Abstract（参考訳）: 人間のために設計された心理学機器は、研究の参加者のプロキシとしての使用、安全性評価、使用に影響を及ぼす、大きな言語モデル(LLM)の安定した心理学的プロファイルを割り当てるのにますます使われている。公式な心理測定フレームワークを用いて、これらのプロファイルは主として測定成果物であることを示す。自己申告・行動課題にまたがる個性・リスク評価機器の電池を56個の指導指導用LDMに配置し, 人為的基準試料を多用し, 4つの知見を報告する。第一に、モデルの違いは、楽器の標的となる特性ではなく、方向の反応バイアス、尺度の片端に反応する傾向、または項目の内容に関わらず1つのラベル付きオプションによって駆動される。第二に、バイアスはモデル能力によって減少するが、それによって排除されない。第3に、特性駆動よりもバイアスが応答するので、楽器の明らかな信頼性は、その応答直交によってほぼ完全に予測される。第4に、モデルが使用するアイテムとシフトしているように見えるプロファイルを、アイテムの選択を通じて製造することができる。これらの結果は、LCMの明らかな心理学的プロファイルが、モデル自体の特性ではなく、それらを測定するために使用される機器の人工物であることを示している。人間の心理学から借用された楽器は、完全に直交することはほとんどなく、LLMの妥当性が本質的に欠落している可能性があるため、反応直交を中心にした専用の評価が求められている。

関連論文リスト

Machine individuality: Separating genuine idiosyncrasy from response bias in large language models [1.4323566945483497]
大規模言語モデル(LLM)は、高い意思決定支援から協力関係まで、日々の生活にますます統合されている。ここでは、10のオープンウェイトLLMが14のサイコ言語規範で10万語以上に対して提供した7490万のレーティングにランダムクロスモデルを適用する。平均して16.9%の分散は刺激特異的な個人性に起因するものであり、統計的なヌルモデルを上回る。
論文参考訳（メタデータ） (2026-04-18T00:02:41Z)
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers [14.983442449498739]
本研究は,人間の心理特性の相関構造を,最小の量的入力からモデル化できるかどうかについて検討する。我々は816人の個人から、他の9つの心理的尺度でのロールプレイを行うために、ビッグファイブ・パーソナリティ・スケールの反応を持つ様々なLSMを誘導した。 LLMは人間の心理的構造を捉えるのに顕著な精度を示した。
論文参考訳（メタデータ） (2025-11-05T06:51:13Z)
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs [60.15472325639723]
人格特性は、人間の行動の予測因子として長い間研究されてきた。近年のLarge Language Models (LLM) は, 人工システムに類似したパターンが出現する可能性を示唆している。
論文参考訳（メタデータ） (2025-09-03T21:27:10Z)
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [45.41676783204022]
大規模言語モデル(LLM)におけるバイアスの様々なプロキシ尺度について検討する。 MMLU (Multi-subject benchmark) を用いた人格評価モデルでは, スコアの無作為かつ大半がランダムな差が生じることがわかった。 LLMアシスタントメモリとパーソナライゼーションの最近の傾向により、これらの問題は異なる角度から開かれている。
論文参考訳（メタデータ） (2025-06-12T08:47:40Z)
Evaluating Personality Traits in Large Language Models: Insights from Psychological Questionnaires [3.6001840369062386]
この研究は、多種多様なシナリオにおける大規模言語モデルに心理学的ツールを適用し、パーソナリティプロファイルを生成する。以上の結果から, LLMは, 同一モデル群においても, 特徴, 特徴, 性格の異なる特徴を示すことが明らかとなった。
論文参考訳（メタデータ） (2025-02-07T16:12:52Z)
Trust Your Gut: Comparing Human and Machine Inference from Noisy Visualizations [7.305342793164905]
人間の直観が理想的な統計的合理性を超えたシナリオを考察する。その結果,合理性から外れた場合でも,可視化に対するアナリストの反応が有利である可能性が示唆された。
論文参考訳（メタデータ） (2024-07-23T22:39:57Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
大規模言語モデル(LLM)が人間の反応バイアスをどの程度反映しているかについて検討する。アンケート調査では, LLMが人間のような応答バイアスを示すかどうかを評価するためのデータセットとフレームワークを設計した。 9つのモデルに対する総合的な評価は、一般のオープンかつ商用のLCMは、一般的に人間のような振る舞いを反映しないことを示している。
論文参考訳（メタデータ） (2023-11-07T15:40:43Z)
Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias [33.99768156365231]
生成言語モデルにおけるバイアス測定のための因果的定式化を導入する。我々はOccuGenderというベンチマークを提案し、職業性バイアスを調査するためのバイアス測定手法を提案する。以上の結果から,これらのモデルでは職業性バイアスがかなり大きいことが示唆された。
論文参考訳（メタデータ） (2022-12-20T22:41:24Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。