Fugu-MT 論文翻訳(概要): KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts

論文の概要: KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts

arxiv url: http://arxiv.org/abs/2508.19944v2
Date: Sun, 31 Aug 2025 10:33:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-03 12:29:36.802456
Title: KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts
Title（参考訳）: KRETA: 様々な視覚環境に適応したテキストリッチVQAにおける韓国語読解と推論のためのベンチマーク
Authors: Taebaek Hwang, Minseo Kim, Gisang Lee, Seonuk Kim, Hyunjun Eun,
Abstract要約: KRETA (KRETA) は、様々な視覚的コンテキストに適応したテキストリッチVQAにおける韓国の読み書きとrEasoningのベンチマークである。 KRETAは、多面的評価をサポートしながら、視覚的テキスト理解と推論能力の両方の詳細な評価を容易にする。テキストリッチな設定に最適化された半自動VQA生成パイプラインを導入する。
参考スコア（独自算出の注目度）: 5.689962668710347
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding and reasoning over text within visual contexts poses a significant challenge for Vision-Language Models (VLMs), given the complexity and diversity of real-world scenarios. To address this challenge, text-rich Visual Question Answering (VQA) datasets and benchmarks have emerged for high-resource languages like English. However, a critical gap persists for low-resource languages such as Korean, where the lack of comprehensive benchmarks hinders robust model evaluation and comparison. To bridge this gap, we introduce KRETA, a benchmark for Korean Reading and rEasoning in Text-rich VQA Attuned to diverse visual contexts. KRETA facilitates an in-depth evaluation of both visual text understanding and reasoning capabilities, while also supporting a multifaceted assessment across 15 domains and 26 image types. Additionally, we introduce a semi-automated VQA generation pipeline specifically optimized for text-rich settings, leveraging refined stepwise image decomposition and a rigorous seven-metric evaluation protocol to ensure data quality. While KRETA is tailored for Korean, we hope our adaptable and extensible pipeline will facilitate the development of similar benchmarks in other languages, thereby accelerating multilingual VLM research. The code and dataset for KRETA are available at https://github.com/tabtoyou/KRETA.
Abstract（参考訳）: 視覚的コンテキスト内のテキストに対する理解と推論は、現実のシナリオの複雑さと多様性を考えれば、視覚言語モデル(VLM)にとって大きな課題となる。この課題に対処するため、テキストリッチなVisual Question Answering(VQA)データセットとベンチマークが、英語などの高リソース言語に登場した。しかし、韓国のような低リソース言語では、包括的なベンチマークの欠如により、堅牢なモデル評価と比較が妨げられる。このギャップを埋めるために、さまざまな視覚的コンテキストに適応したテキストリッチVQAにおいて、韓国読解とrEasoningのベンチマークであるKRETAを導入する。 KRETAは、視覚的テキスト理解と推論機能の両方の詳細な評価を促進すると同時に、15のドメインと26のイメージタイプにわたる多面的評価をサポートする。さらに、テキストリッチな設定に特化して最適化された半自動VQA生成パイプラインを導入し、精巧なステップワイド画像分解と厳密な7次元評価プロトコルを活用し、データ品質を確保する。 KRETAは韓国語用に最適化されているが、我々の適応可能で拡張可能なパイプラインは、他の言語での類似ベンチマークの開発を促進し、多言語VLM研究の加速を期待する。 KRETAのコードとデータセットはhttps://github.com/tabtoyou/KRETAで公開されている。

論文の概要: KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts

関連論文リスト