Fugu-MT 論文翻訳(概要): On the Cultural Anachronism and Temporal Reasoning in Vision Language Models

論文の概要: On the Cultural Anachronism and Temporal Reasoning in Vision Language Models

arxiv url: http://arxiv.org/abs/2605.15071v1
Date: Thu, 14 May 2026 16:58:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.961841
Title: On the Cultural Anachronism and Temporal Reasoning in Vision Language Models
Title（参考訳）: 視覚言語モデルにおける文化的アナコニズムと時間的推論について
Authors: Mukul Ranjan, Prince Jha, Khushboo Kumari, Zhiqiang Shen,
Abstract要約: ヴィジュアル・ランゲージ・モデル(VLM)は、文化遺産にますます応用されている。この研究は、これらのモデルが歴史的アーティファクトをどのように解釈するかという根本的な問題を特定する。我々は、この現象を、時間的に不適切な概念を用いて歴史的対象を誤解釈する傾向である文化的アナクロニズムと定義する。
参考スコア（独自算出の注目度）: 35.132248635251266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-Language Models (VLMs) are increasingly applied to cultural heritage materials, from digital archives to educational platforms. This work identifies a fundamental issue in how these models interpret historical artifacts. We define this phenomenon as cultural anachronism, the tendency to misinterpret historical objects using temporally inappropriate concepts, materials, or cultural frameworks. To quantify this phenomenon, we introduce the Temporal Anachronism Benchmark for Vision-Language Models (TAB-VLM), a dataset of 600 questions across six categories, designed to evaluate temporal reasoning on 1,600 Indian cultural artifacts spanning prehistoric to modern periods. Systematic evaluations of ten state-of-the-art models reveal significant deficiencies on our benchmark, and even the best model (GPT-5.2) achieves only 58.7% overall accuracy. The performance gap persists across varying architectures and scales, suggesting that cultural anachronism represents a significant limitation in visual AI systems, regardless of model size. These findings highlight the disparity between current VLM capabilities and the requirements for accurately interpreting cultural heritage materials, particularly for non-Western visual cultures underrepresented in training data. Our benchmark provides a foundation for enhancing temporal cognition in multimodal AI systems that interact with historical artifacts. The dataset and code are available in our project page.
Abstract（参考訳）: VLM(Vision-Language Models)は、デジタルアーカイブから教育プラットフォームまで、ますます文化遺産に応用されている。この研究は、これらのモデルが歴史的アーティファクトをどのように解釈するかという根本的な問題を特定する。我々は、この現象を文化的アナクロニズムと定義し、時間的に不適切な概念、資料、文化の枠組みを用いて歴史的対象を誤解釈する傾向がある。この現象を定量化するために、先史時代から近代にかけての1,600のインド文化アーティファクトに関する時間的推論を評価するために、6つのカテゴリにわたる600の質問のデータセットであるTAB-VLM(Temporal Anachronism Benchmark for Vision-Language Models)を導入する。 10の最先端モデルの体系的評価は、我々のベンチマークに重大な欠陥を示し、最高のモデル(GPT-5.2)でさえ、全体的な精度は58.7%に過ぎなかった。パフォーマンスギャップはさまざまなアーキテクチャやスケールにまたがって持続しており、モデルのサイズに関わらず、文化的なアナクロニズムが視覚AIシステムにおいて重要な制限となっていることを示唆している。これらの知見は、現在のVLM能力と文化遺産の正確な解釈要件の相違、特にトレーニングデータで表現されていない西洋以外の視覚文化の相違を浮き彫りにしている。我々のベンチマークは、歴史的アーティファクトと相互作用するマルチモーダルAIシステムにおいて、時間認知を高める基盤を提供する。データセットとコードはプロジェクトのページで公開されています。

論文の概要: On the Cultural Anachronism and Temporal Reasoning in Vision Language Models

関連論文リスト