Fugu-MT 論文翻訳(概要): RegionRAG: Region-level Retrieval-Augumented Generation for Visually-Rich Documents

論文の概要: RegionRAG: Region-level Retrieval-Augumented Generation for Visually-Rich Documents

arxiv url: http://arxiv.org/abs/2510.27261v1
Date: Fri, 31 Oct 2025 08:00:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:16.030032
Title: RegionRAG: Region-level Retrieval-Augumented Generation for Visually-Rich Documents
Title（参考訳）: RegionRAG: ビジュアルリッチ文書のための領域レベルの検索型生成
Authors: Yinglu Li, Zhiying Lu, Zhihang Liu, Chuanbin Liu, Hongtao Xie,
Abstract要約: Modelnameは、検索パラダイムをドキュメントレベルからリージョンレベルにシフトする、新しいフレームワークです。 6つのベンチマークの実験は、RereaRAGが最先端のパフォーマンスを達成することを示した。
参考スコア（独自算出の注目度）: 40.107303323097646
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-modal Retrieval-Augmented Generation (RAG) has become a critical method for empowering LLMs by leveraging candidate visual documents. However, current methods consider the entire document as the basic retrieval unit, introducing substantial irrelevant visual content in two ways: 1) Relevant documents often contain large regions unrelated to the query, diluting the focus on salient information; 2) Retrieving multiple documents to increase recall further introduces redundant and irrelevant documents. These redundant contexts distract the model's attention and further degrade the performance. To address this challenge, we propose \modelname, a novel framework that shifts the retrieval paradigm from the document level to the region level. During training, we design a hybrid supervision strategy from both labeled data and unlabeled data to pinpoint relevant patches. During inference, we propose a dynamic pipeline that intelligently groups salient patches into complete semantic regions. By delegating the task of identifying relevant regions to the retriever, \modelname enables the generator to focus solely on concise visual content relevant to queries, improving both efficiency and accuracy. Experiments on six benchmarks demonstrate that RegionRAG achieves state-of-the-art performance. Improves retrieval accuracy by 10.02\% in R@1 on average and increases question answering accuracy by 3.56\% while using only 71.42\% visual tokens compared to prior methods. The code will be available at https://github.com/Aeryn666/RegionRAG.
Abstract（参考訳）: マルチモーダル検索・拡張生成(RAG)は,候補となる視覚文書を活用することでLLMの強化に重要な手法となっている。しかし、現在の手法では、文書全体を基本的な検索単位とみなし、実質的に無関係な視覚コンテンツを2つの方法で導入している。 1)関連書類は、しばしば、問い合わせに関係のない大きな領域を包含し、有能な情報に焦点を絞り込む。 2)リコールを増やすために複数の文書を取得することは、さらに冗長で無関係な文書を導入する。これらの冗長なコンテキストはモデルの注意をそらし、パフォーマンスをさらに低下させます。この課題に対処するために,検索パラダイムを文書レベルから地域レベルにシフトさせる新しいフレームワークである \modelname を提案する。トレーニング中、ラベル付きデータとラベルなしデータの両方から、関連するパッチをピンポイントするハイブリッド監視戦略を設計する。推論中は、適切なパッチを完全なセマンティック領域にインテリジェントにグループ化する動的パイプラインを提案する。関連領域をレトリバーに識別するタスクを委譲することで、生成元はクエリに関連する簡潔なビジュアルコンテンツのみに集中でき、効率と精度が向上する。 6つのベンチマークの実験は、RereaRAGが最先端のパフォーマンスを達成することを示した。 R@1の平均での検索精度は10.02\%向上し、質問応答精度は3.56\%向上する。コードはhttps://github.com/Aeryn666/RegionRAGで入手できる。

論文の概要: RegionRAG: Region-level Retrieval-Augumented Generation for Visually-Rich Documents

関連論文リスト