Fugu-MT 論文翻訳(概要): PluriHop: Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora

論文の概要: PluriHop: Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora

arxiv url: http://arxiv.org/abs/2510.14377v1
Date: Thu, 16 Oct 2025 07:22:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 21:15:14.757508
Title: PluriHop: Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora
Title（参考訳）: PluriHop: ディトラクタ・リッチコーポラに対する排他的、リコールに敏感なQA
Authors: Mykolas Sveistrys, Richard Kunert,
Abstract要約: PluriHopWINDは、ドイツ語と英語の191の現実世界の風力産業レポートから構築された48のプルホップ質問の診断用多言語データセットである。 PluriHopWIND は他の一般的なデータセットよりも 8-40% の反復性を示し,その有効性を示す。本稿では,RAGアーキテクチャであるPluriHopRAGを提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models (LLMs) and retrieval-augmented generation (RAG) have enabled progress on question answering (QA) when relevant evidence is in one (single-hop) or multiple (multi-hop) passages. Yet many realistic questions about recurring report data - medical records, compliance filings, maintenance logs - require aggregation across all documents, with no clear stopping point for retrieval and high sensitivity to even one missed passage. We term these pluri-hop questions and formalize them by three criteria: recall sensitivity, exhaustiveness, and exactness. To study this setting, we introduce PluriHopWIND, a diagnostic multilingual dataset of 48 pluri-hop questions built from 191 real-world wind industry reports in German and English. We show that PluriHopWIND is 8-40% more repetitive than other common datasets and thus has higher density of distractor documents, better reflecting practical challenges of recurring report corpora. We test a traditional RAG pipeline as well as graph-based and multimodal variants, and find that none of the tested approaches exceed 40% in statement-wise F1 score. Motivated by this, we propose PluriHopRAG, a RAG architecture that follows a "check all documents individually, filter cheaply" approach: it (i) decomposes queries into document-level subquestions and (ii) uses a cross-encoder filter to discard irrelevant documents before costly LLM reasoning. We find that PluriHopRAG achieves relative F1 score improvements of 18-52% depending on base LLM. Despite its modest size, PluriHopWIND exposes the limitations of current QA systems on repetitive, distractor-rich corpora. PluriHopRAG's performance highlights the value of exhaustive retrieval and early filtering as a powerful alternative to top-k methods.
Abstract（参考訳）: 大規模言語モデル (LLM) と検索強化世代 (RAG) の最近の進歩により, 関連する証拠が1つ (シングルホップ) または複数 (マルチホップ) の通路にある場合, 質問応答 (QA) の進行が可能になった。しかし、定期的なレポートデータ(医療記録、コンプライアンス書類、メンテナンスログ)に関する現実的な多くの質問は、すべてのドキュメントを集約する必要がある。我々はこれらの多重ホップ質問を3つの基準(リコール感度、排他性、正確性)で定式化する。この設定を研究するために、ドイツ語と英語の191の現実世界の風力産業レポートから構築された48のプルホップ質問の診断用多言語データセットPluriHopWINDを紹介した。 PluriHopWIND は他の一般的なデータセットよりも 8-40% 繰り返しであることを示す。我々は従来のRAGパイプラインとグラフベースおよびマルチモーダルの変種をテストし、テスト対象のアプローチがステートメントワイズF1スコアの40%を超えないことを発見した。これを動機として,RAGアーキテクチャであるPluriHopRAGを提案する。 (i)クエリを文書レベルのサブクエストに分解し、 (ii) クロスエンコーダフィルタを用いて、コストのかかるLCM推論の前に、無関係な文書を破棄する。 PluriHopRAGは,ベースLLMによるF1スコアが18～52%向上した。 PluriHopWINDは、その質素なサイズにもかかわらず、現在のQAシステムの制限を繰り返し、気を散らすコーパスにさらしている。 PluriHopRAGのパフォーマンスは、トップkメソッドの強力な代替手段として、徹底的な検索と早期フィルタリングの価値を強調している。

論文の概要: PluriHop: Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora

関連論文リスト