Fugu-MT 論文翻訳(概要): Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

論文の概要: Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

arxiv url: http://arxiv.org/abs/2509.21950v1
Date: Fri, 26 Sep 2025 06:30:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.241539
Title: Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
Title（参考訳）: MLLMの視覚的感情評価のカスタマイズ:オープン語彙,多面的,スケーラブルなアプローチ
Authors: Daiqing Wu, Dongbao Yang, Sicheng Zhao, Can Ma, Yu Zhou,
Abstract要約: この矛盾は, 既存の評価手法の制約に起因していると論じる。これらの制約を克服する感情文判断タスクを提案する。人間の努力を最小限に抑えて感情中心の文を効率的に構築する自動パイプラインを考案する。
参考スコア（独自算出の注目度）: 29.502292089901825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, Multimodal Large Language Models (MLLMs) have achieved exceptional performance across diverse tasks, continually surpassing previous expectations regarding their capabilities. Nevertheless, their proficiency in perceiving emotions from images remains debated, with studies yielding divergent results in zero-shot scenarios. We argue that this inconsistency stems partly from constraints in existing evaluation methods, including the oversight of plausible responses, limited emotional taxonomies, neglect of contextual factors, and labor-intensive annotations. To facilitate customized visual emotion evaluation for MLLMs, we propose an Emotion Statement Judgment task that overcomes these constraints. Complementing this task, we devise an automated pipeline that efficiently constructs emotion-centric statements with minimal human effort. Through systematically evaluating prevailing MLLMs, our study showcases their stronger performance in emotion interpretation and context-based emotion judgment, while revealing relative limitations in comprehending perception subjectivity. When compared to humans, even top-performing MLLMs like GPT4o demonstrate remarkable performance gaps, underscoring key areas for future improvement. By developing a fundamental evaluation framework and conducting a comprehensive MLLM assessment, we hope this work contributes to advancing emotional intelligence in MLLMs. Project page: https://github.com/wdqqdw/MVEI.
Abstract（参考訳）: 近年,MLLM (Multimodal Large Language Models) は様々なタスクにまたがって例外的な性能を達成しており,その能力に対する期待を継続的に上回っている。それでも、イメージから感情を知覚する能力については議論があり、ゼロショットのシナリオで異なる結果をもたらす研究がある。この不整合性は, 既往の評価手法の制約に起因し, 既往の応答の監視, 感情的分類の制限, 文脈的要因の無視, 労働集約的アノテーションなどが原因であると考えられる。 MLLMの視覚的感情評価のカスタマイズを容易にするために,これらの制約を克服する感情文判断タスクを提案する。このタスクを補完し、人間の最小限の努力で感情中心のステートメントを効率的に構築する自動化パイプラインを考案する。本研究は、MLLMを系統的に評価することにより、感情解釈と文脈に基づく感情判断において、より強いパフォーマンスを示すとともに、知覚主観性を理解する上での相対的制限を明らかにする。人間と比較しても、GPT4oのような最高パフォーマンスのMLLMでさえ、パフォーマンスの差が顕著であり、将来の改善の鍵となる領域を強調している。基本的な評価枠組みを開発し,総合的なMLLM評価を行うことで,MLLMの感情知能向上に寄与することが期待できる。プロジェクトページ: https://github.com/wdqqdw/MVEI.com

論文の概要: Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

関連論文リスト