Fugu-MT 論文翻訳(概要): SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents

論文の概要: SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents

arxiv url: http://arxiv.org/abs/2511.04910v2
Date: Mon, 10 Nov 2025 04:20:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-11 14:56:00.571405
Title: SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents
Title（参考訳）: SDS KoPub VDR:韓国の公文書におけるビジュアルドキュメント検索のためのベンチマークデータセット
Authors: Jaehoon Lee, Sohyun Kim, Wanggeun Park, Geon Lee, Seungkyung Kim, Minyoung Lee,
Abstract要約: 既存のビジュアル文書検索(VDR)のベンチマークは、非英語言語と公式出版物の構造的複雑さを概ね見落としている。 SDS KoPub VDRは,韓国の公文書の検索と理解のための,最初の大規模公開ベンチマークである。ベンチマークは361の現実世界のドキュメント上に構築されており、KOGL Type 1ライセンス下の256ファイル、公式の法的ポータルからの105ファイルが含まれている。
参考スコア（独自算出の注目度）: 10.146296597660598
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Existing benchmarks for visual document retrieval (VDR) largely overlook non-English languages and the structural complexity of official publications. To address this gap, we introduce SDS KoPub VDR, the first large-scale, public benchmark for retrieving and understanding Korean public documents. The benchmark is built upon 361 real-world documents, including 256 files under the KOGL Type 1 license and 105 from official legal portals, capturing complex visual elements like tables, charts, and multi-column layouts. To establish a reliable evaluation set, we constructed 600 query-page-answer triples. These were initially generated using multimodal models (e.g., GPT-4o) and subsequently underwent human verification to ensure factual accuracy and contextual relevance. The queries span six major public domains and are categorized by the reasoning modality required: text-based, visual-based, and cross-modal. We evaluate SDS KoPub VDR on two complementary tasks: (1) text-only retrieval and (2) multimodal retrieval, which leverages visual features alongside text. This dual-task evaluation reveals substantial performance gaps, particularly in multimodal scenarios requiring cross-modal reasoning, even for state-of-the-art models. As a foundational resource, SDS KoPub VDR enables rigorous and fine-grained evaluation and provides a roadmap for advancing multimodal AI in real-world document intelligence. The dataset is available at https://huggingface.co/datasets/SamsungSDS-Research/SDS-KoPub-VDR-Benchmark.
Abstract（参考訳）: 既存のビジュアル文書検索(VDR)のベンチマークは、非英語言語と公式出版物の構造的複雑さを概ね見落としている。このギャップに対処するために,韓国の公文書の検索と理解のための,最初の大規模公開ベンチマークであるSDS KoPub VDRを紹介する。 KOGL Type 1ライセンス下で256ファイル、公式の法的ポータルから105ファイル、テーブル、チャート、マルチカラムレイアウトなどの複雑なビジュアル要素をキャプチャする。信頼性の高い評価セットを確立するため,600個のクエリー・ページ・アンサー・トリプルを構築した。これらは最初、マルチモーダルモデル(例: GPT-4o)を用いて生成され、その後、事実の正確性と文脈的関連性を保証するために人間による検証が行われた。クエリは6つの主要なパブリックドメインにまたがり、テキストベース、ビジュアルベース、クロスモーダルという、必要な推論モダリティによって分類される。我々は,(1)テキストのみの検索と(2)テキストと並行して視覚的特徴を活用するマルチモーダル検索という2つの相補的なタスクにおいて,SDS KoPub VDRを評価する。このデュアルタスク評価は、特に最先端モデルであっても、クロスモーダル推論を必要とするマルチモーダルシナリオにおいて、大きなパフォーマンスギャップを示す。基本的なリソースとして、SDS KoPub VDRは厳密できめ細かな評価を可能にし、現実世界のドキュメントインテリジェンスにおいてマルチモーダルAIを前進させるためのロードマップを提供する。データセットはhttps://huggingface.co/datasets/SamsungSDS-Research/SDS-KoPub-VDR-Benchmarkで公開されている。

論文の概要: SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents

関連論文リスト