Fugu-MT 論文翻訳(概要): MOSAIC: Unveiling the Moral, Social and Individual Dimensions of Large Language Models

論文の概要: MOSAIC: Unveiling the Moral, Social and Individual Dimensions of Large Language Models

arxiv url: http://arxiv.org/abs/2603.00048v1
Date: Mon, 09 Feb 2026 22:45:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:07.996119
Title: MOSAIC: Unveiling the Moral, Social and Individual Dimensions of Large Language Models
Title（参考訳）: MOSAIC:大規模言語モデルの道徳的,社会的,個人的次元を明らかにする
Authors: Erica Coppolillo, Emilio Ferrara,
Abstract要約: 大規模言語モデル(LLM)は、心理学的サポート、医療、高い意思決定を含むセンシティブなアプリケーションにますますデプロイされている。我々は,LLMの道徳的,社会的,個人的特性を共同評価するために設計された,最初の大規模ベンチマークであるMOSAICを紹介する。このベンチマークは、道徳哲学、心理学、社会理論から引き出された9つの検証済みのアンケートと、道徳的に曖昧なシナリオを調査するために設計された4つのプラットフォームベースのゲームで構成されている。
参考スコア（独自算出の注目度）: 9.025479777784675
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) are increasingly deployed in sensitive applications including psychological support, healthcare, and high-stakes decision-making. This expansion has motivated growing research into the ethical and moral foundations underlying LLM behavior, raising critical questions about their reliability in ethical reasoning. However, existing studies and benchmarks rely almost exclusively on Moral Foundation Theory (MFT), largely neglecting other relevant dimensions such as social values, personality traits, and individual characteristics that shape human ethical reasoning. To address these limitations, we introduce MOSAIC, the first large-scale benchmark designed to jointly assess the moral, social, and individual characteristics of LLMs. The benchmark comprises nine validated questionnaires drawn from moral philosophy, psychology, and social theory, alongside four platform-based games designed to probe morally ambiguous scenarios. In total, MOSAIC includes over 600 curated questions and scenarios, released as a ready-to-use, extensible resource for evaluating the behavioral foundations of LLMs. We validate the benchmark across three models from different families, demonstrating its utility across all assessed dimensions and providing the first empirical evidence that MFT alone is insufficient to comprehensively evaluate complex AI systems' ethical behavior. We publicly release the dataset and our benchmark Python library.
Abstract（参考訳）: 大規模言語モデル(LLM)は、心理学的サポート、医療、高い意思決定を含むセンシティブなアプリケーションにますますデプロイされている。この拡張は、LLM行動の基礎となる倫理的・道徳的基礎の研究の動機となり、倫理的推論における彼らの信頼性に関する批判的な疑問を提起した。しかし、既存の研究やベンチマークは、社会価値や性格特性、人間の倫理的推論を形成する個性など、他の関連する側面をほとんど無視して、MFT(Moral Foundation Theory)にのみ依存している。これらの制約に対処するために,LLMの道徳的,社会的,個人的特性を共同評価する最初の大規模ベンチマークであるMOSAICを導入する。このベンチマークは、道徳哲学、心理学、社会理論から引き出された9つの検証済みのアンケートと、道徳的に曖昧なシナリオを調査するために設計された4つのプラットフォームベースのゲームで構成されている。 MOSAICには600以上のキュレートされた質問とシナリオが含まれており、LCMの行動基盤を評価するための準備が整った拡張可能なリソースとしてリリースされている。我々は、異なる家系の3つのモデルにまたがってベンチマークを検証し、すべての評価された次元でその有用性を実証し、複雑なAIシステムの倫理的振る舞いを包括的に評価するにはMFTだけで不十分であることを示す最初の実証的な証拠を提供する。データセットとベンチマークPythonライブラリを公開しています。

論文の概要: MOSAIC: Unveiling the Moral, Social and Individual Dimensions of Large Language Models

関連論文リスト