Fugu-MT 論文翻訳(概要): A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks

論文の概要: A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks

arxiv url: http://arxiv.org/abs/2509.23208v1
Date: Sat, 27 Sep 2025 09:41:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.105121
Title: A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks
Title（参考訳）: 文化的タスクにおける多モードLLMの解釈能力の評価と向上のための構造的枠組み
Authors: Haorui Yu, Ramon Ruiz-Dolz, Qiufeng Yi,
Abstract要約: 本研究では,現在主流となっているビジュアル言語モデル(VLM)の機能と特性を検証し,評価することを目的とする。われわれはまず中国絵画評論の定量的枠組みを考案した。この枠組みは, 評価的姿勢, 特徴焦点, 解説的品質を含む多次元的評価的特徴を, 人間の専門家の批判から抽出することによって構築された。実験的な設計は、様々な視点から批評を生成するVLMの能力を評価するためのペルソナ誘導のプロンプトを含んでいた。
参考スコア（独自算出の注目度）: 3.491999371287299
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study aims to test and evaluate the capabilities and characteristics of current mainstream Visual Language Models (VLMs) in generating critiques for traditional Chinese painting. To achieve this, we first developed a quantitative framework for Chinese painting critique. This framework was constructed by extracting multi-dimensional evaluative features covering evaluative stance, feature focus, and commentary quality from human expert critiques using a zero-shot classification model. Based on these features, several representative critic personas were defined and quantified. This framework was then employed to evaluate selected VLMs such as Llama, Qwen, or Gemini. The experimental design involved persona-guided prompting to assess the VLM's ability to generate critiques from diverse perspectives. Our findings reveal the current performance levels, strengths, and areas for improvement of VLMs in the domain of art critique, offering insights into their potential and limitations in complex semantic understanding and content generation tasks. The code used for our experiments can be publicly accessed at: https://github.com/yha9806/VULCA-EMNLP2025.
Abstract（参考訳）: 本研究の目的は,従来の中国絵画の批評作成において,現在主流となっている視覚言語モデル(VLM)の機能と特性を検証し,評価することである。そこで我々はまず中国絵画評論の定量的枠組みを考案した。この枠組みは, ゼロショット分類モデルを用いて, 評価姿勢, 特徴焦点, コメント品質を含む多次元的評価特徴を人間の専門家批判から抽出することによって構築された。これらの特徴に基づき、いくつかの代表的な批評家ペルソナを定義し、定量化した。このフレームワークを使用して、Llama、Qwen、Geminiといった選択されたVLMを評価した。実験的な設計は、様々な視点から批評を生成するVLMの能力を評価するためのペルソナ誘導のプロンプトを含んでいた。本研究は,芸術批評分野におけるVLMの性能レベル,強み,改善領域を明らかにし,複雑な意味理解とコンテンツ生成タスクにおけるその可能性と限界について考察した。私たちの実験で使用されるコードは、https://github.com/yha9806/VULCA-EMNLP2025で公開できます。

論文の概要: A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks

関連論文リスト