Fugu-MT 論文翻訳(概要): SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents

論文の概要: SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents

arxiv url: http://arxiv.org/abs/2511.04910v1
Date: Fri, 07 Nov 2025 01:16:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-10 21:00:44.637815
Title: SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents
Title（参考訳）: SDS KoPub VDR:韓国の公文書におけるビジュアルドキュメント検索のためのベンチマークデータセット
Authors: Jaehoon Lee, Sohyun Kim, Wanggeun Park, Geon Lee, Seungkyung Kim, Minyoung Lee,
Abstract要約: 既存のビジュアル文書検索(VDR)のベンチマークは、ほとんど英語以外の言語を見落としている。 SDS KoPub VDRは,韓国の公文書の検索と理解のための,最初の大規模かつ一般公開されたベンチマークである。
参考スコア（独自算出の注目度）: 10.146296597660598
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Existing benchmarks for visual document retrieval (VDR) largely overlook non-English languages and the structural complexity of official publications. To address this critical gap, we introduce SDS KoPub VDR, the first large-scale, publicly available benchmark for retrieving and understanding Korean public documents. The benchmark is built upon a corpus of 361 real-world documents (40,781 pages), including 256 files under the KOGL Type 1 license and 105 from official legal portals, capturing complex visual elements like tables, charts, and multi-column layouts. To establish a challenging and reliable evaluation set, we constructed 600 query-page-answer triples. These were initially generated using multimodal models (e.g., GPT-4o) and subsequently underwent a rigorous human verification and refinement process to ensure factual accuracy and contextual relevance. The queries span six major public domains and are systematically categorized by the reasoning modality required: text-based, visual-based (e.g., chart interpretation), and cross-modal. We evaluate SDS KoPub VDR on two complementary tasks that reflect distinct retrieval paradigms: (1) text-only retrieval, which measures a model's ability to locate relevant document pages based solely on textual signals, and (2) multimodal retrieval, which assesses retrieval performance when visual features (e.g., tables, charts, and layouts) are jointly leveraged alongside text. This dual-task evaluation reveals substantial performance gaps, particularly in multimodal scenarios requiring cross-modal reasoning, even for state-of-the-art models. As a foundational resource, SDS KoPub VDR not only enables rigorous and fine-grained evaluation across textual and multimodal retrieval tasks but also provides a clear roadmap for advancing multimodal AI in complex, real-world document intelligence.
Abstract（参考訳）: 既存のビジュアル文書検索(VDR)のベンチマークは、非英語言語と公式出版物の構造的複雑さを概ね見落としている。この重要なギャップに対処するために、韓国の公文書の検索と理解のための、最初の大規模で一般公開されたベンチマークであるSDS KoPub VDRを紹介します。ベンチマークは361の現実世界のドキュメント(40,781ページ)のコーパス上に構築されており、KOGL Type 1ライセンス下の256ファイルと公式の法的ポータルからの105ファイルが含まれ、テーブル、チャート、マルチカラムレイアウトなどの複雑なビジュアル要素をキャプチャする。困難かつ信頼性の高い評価セットを確立するため,600のクエリーページ答え三重項を構築した。これらは最初、マルチモーダルモデル(例: GPT-4o)を用いて生成され、その後、事実の正確性と文脈的関連性を保証するための厳密な検証と改善プロセスが実施された。クエリは6つの主要なパブリックドメインにまたがっており、テキストベース、ビジュアルベース(例えば、チャートの解釈)、クロスモーダル(英語版)といった推論モダリティによって体系的に分類されている。我々は,(1)テキストのみの検索,(2)テキストのみの検索,2)視覚的特徴(表,表,レイアウトなど)がテキストと一緒に活用される場合の検索性能を評価するマルチモーダル検索という2つの相補的タスクについて,SDS KoPub VDRを評価した。このデュアルタスク評価は、特に最先端モデルであっても、クロスモーダル推論を必要とするマルチモーダルシナリオにおいて、大きなパフォーマンスギャップを示す。基本的なリソースとして、SDS KoPub VDRは、テキストおよびマルチモーダル検索タスク間の厳密できめ細かな評価を可能にするだけでなく、複雑な実世界の文書インテリジェンスにおいて、マルチモーダルAIを前進させるための明確なロードマップを提供する。

論文の概要: SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents

関連論文リスト