Fugu-MT 論文翻訳(概要): EICAP: Deep Dive in Assessment and Enhancement of Large Language Models in Emotional Intelligence through Multi-Turn Conversations

論文の概要: EICAP: Deep Dive in Assessment and Enhancement of Large Language Models in Emotional Intelligence through Multi-Turn Conversations

arxiv url: http://arxiv.org/abs/2508.06196v1
Date: Fri, 08 Aug 2025 10:22:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-11 20:39:06.195257
Title: EICAP: Deep Dive in Assessment and Enhancement of Large Language Models in Emotional Intelligence through Multi-Turn Conversations
Title（参考訳）: EICAP:多言語会話による感情情報における大規模言語モデルの評価と強化の深層研究
Authors: Nizi Nazar, Ehsaneddin Asgari,
Abstract要約: 我々は,大規模言語モデル(LLM)に適した,統合的,心理的に基礎付けられた情緒的知能の4層分類法(EI)を導入する。 EICAP-BenchはオープンソースのLLMにおけるEI能力を評価するために設計された,新しいマルチターンベンチマークである。統計的解析の結果,5つのEI層のうち,評価層のみがUCベースファインチューニングによる顕著な改善を示すことが明らかとなった。
参考スコア（独自算出の注目度）: 0.9023847175654603
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Emotional Intelligence (EI) is a critical yet underexplored dimension in the development of human-aligned LLMs. To address this gap, we introduce a unified, psychologically grounded four-layer taxonomy of EI tailored for large language models (LLMs), encompassing emotional tracking, cause inference, appraisal, and emotionally appropriate response generation. Building on this framework, we present EICAP-Bench, a novel MCQ style multi-turn benchmark designed to evaluate EI capabilities in open-source LLMs across diverse linguistic and cultural contexts. We evaluate six LLMs: LLaMA3 (8B), LLaMA3-Instruct, Gemma (9B), Gemma-Instruct, Qwen2.5 (7B), and Qwen2.5-Instruct on EmoCap-Bench, identifying Qwen2.5-Instruct as the strongest baseline. To assess the potential for enhancing EI capabilities, we fine-tune both Qwen2.5-Base and Qwen2.5-Instruct using LoRA adapters on UltraChat (UC), a large-scale, instruction-tuned dialogue dataset, in both English and Arabic. Our statistical analysis reveals that among the five EI layers, only the Appraisal layer shows significant improvement through UC-based fine-tuning. These findings highlight the limitations of existing pretraining and instruction-tuning paradigms in equipping LLMs with deeper emotional reasoning and underscore the need for targeted data and modeling strategies for comprehensive EI alignment.
Abstract（参考訳）: EI(Emotional Intelligence, Emotional Intelligence)は、人間によるLLMの開発において、重要かつ未発見の分野である。このギャップに対処するために,大規模言語モデル (LLM) に適した,統合された,心理的に根ざした4階層のEI分類を導入し,感情的追跡,原因推定,評価,感情的適切な応答生成を包含する。このフレームワーク上に構築されたEICAP-Benchは,多言語および文化的文脈におけるオープンソースのLLMにおけるEI能力を評価するために設計された,MCQスタイルのマルチターンベンチマークである。 LLaMA3 (8B), LLaMA3-Instruct, Gemma (9B), Gemma-Instruct, Qwen2.5 (7B), Qwen2.5-Instruct on EmoCap-Bench, and Qwen2.5-Instruct of Qwen2.5-Instruct。英語とアラビア語の対話データセットであるUltraChat(UC)上のLoRAアダプタを用いて,Qwen2.5-BaseとQwen2.5-Instructの両方を微調整する。統計的解析の結果,5つのEI層のうち,評価層のみがUCベースファインチューニングによって著しく改善されていることが明らかとなった。これらの知見は、LLMに深い感情的推論を持たせる上で、既存の事前学習や指導訓練のパラダイムの限界を強調し、総合的なEIアライメントのためのターゲットデータやモデリング戦略の必要性を浮き彫りにしている。

論文の概要: EICAP: Deep Dive in Assessment and Enhancement of Large Language Models in Emotional Intelligence through Multi-Turn Conversations

関連論文リスト