Fugu-MT 論文翻訳(概要): DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture

論文の概要: DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture

arxiv url: http://arxiv.org/abs/2509.19274v1
Date: Tue, 23 Sep 2025 17:40:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-24 20:41:27.982048
Title: DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture
Title（参考訳）: DRISHTIKON: 言語モデルのインド文化理解のためのマルチモーダル多言語ベンチマーク
Authors: Arijit Maji, Raghvendra Kumar, Akash Ghosh, Anushka, Nemil Shah, Abhilekh Borah, Vanshika Shah, Nishant Mishra, Sriparna Saha,
Abstract要約: DRISHTIKON(DRISHTIKON)は、インド文化を中心としたマルチモーダルおよび多言語ベンチマークである。このデータセットは、祭り、服装、料理、芸術形式、歴史遺産を含む豊かな文化的テーマを捉えている。我々は、オープンソースの小型・大規模モデル、プロプライエタリシステム、推論特化VLM、インデックスにフォーカスしたモデルなど、幅広い視覚言語モデル(VLM)を評価する。
参考スコア（独自算出の注目度）: 14.681676046750342
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce DRISHTIKON, a first-of-its-kind multimodal and multilingual benchmark centered exclusively on Indian culture, designed to evaluate the cultural understanding of generative AI systems. Unlike existing benchmarks with a generic or global scope, DRISHTIKON offers deep, fine-grained coverage across India's diverse regions, spanning 15 languages, covering all states and union territories, and incorporating over 64,000 aligned text-image pairs. The dataset captures rich cultural themes including festivals, attire, cuisines, art forms, and historical heritage amongst many more. We evaluate a wide range of vision-language models (VLMs), including open-source small and large models, proprietary systems, reasoning-specialized VLMs, and Indic-focused models, across zero-shot and chain-of-thought settings. Our results expose key limitations in current models' ability to reason over culturally grounded, multimodal inputs, particularly for low-resource languages and less-documented traditions. DRISHTIKON fills a vital gap in inclusive AI research, offering a robust testbed to advance culturally aware, multimodally competent language technologies.
Abstract（参考訳）: DRISHTIKON(DRISHTIKON)は、インド文化を中心とした多言語・多言語ベンチマークであり、生成型AIシステムの文化的理解を評価することを目的としている。ジェネリックまたはグローバルな範囲を持つ既存のベンチマークとは異なり、DRISHTIKONは15の言語にまたがり、すべての州とユニオンの領域をカバーし、64,000以上のテキストイメージのペアを組み込んでいる。このデータセットは、祭り、服装、料理、芸術形式、歴史遺産など、多くの文化的テーマを捉えている。我々は、ゼロショットおよびチェーン・オブ・思想設定において、オープンソースの小型・大規模モデル、プロプライエタリシステム、推論特化VLM、インデックス特化モデルを含む幅広い視覚言語モデル(VLM)を評価した。我々の結果は、特に低リソース言語や文書化の少ない伝統に対して、文化的に根ざしたマルチモーダルな入力を推理する現在のモデルの能力において、重要な制限を明らかにします。 DRISHTIKONは、包括的AI研究において重要なギャップを埋め、文化的に認知され、マルチモーダルな言語技術を進歩させるための堅牢なテストベッドを提供する。

論文の概要: DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture

関連論文リスト