Fugu-MT 論文翻訳(概要): Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment

論文の概要: Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment

arxiv url: http://arxiv.org/abs/2602.24277v1
Date: Fri, 27 Feb 2026 18:49:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-02 19:48:24.571535
Title: Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment
Title（参考訳）: ニュース信頼度評価を支援する補助的RAGシステムの自動評価のためのリソース
Authors: Dake Zhang, Mark D. Smucker, Charles L. A. Clarke,
Abstract要約: 本稿では,TREC 2025 DRAGUNトラックのタスクの再利用を可能にするため,新たに開発したリソースについて述べる。トラックの評価の一環として、TRECアセスタは、30の異なるニュース記事に対して、短い回答を期待して、重要度の高い質問文を作成しました。その後、アセステーターはルーブリックを使用して、参加チームの提出したランニングを手動で判断した。これらのタスクとそのルーブリックを再利用するために、私たちは、元のアセスメントの一部ではない実行を判断する自動化プロセスを作成しました。
参考スコア（独自算出の注目度）: 10.516355770829326
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many readers today struggle to assess the trustworthiness of online news because reliable reporting coexists with misinformation. The TREC 2025 DRAGUN (Detection, Retrieval, and Augmented Generation for Understanding News) Track provided a venue for researchers to develop and evaluate assistive RAG systems that support readers' news trustworthiness assessment by producing reader-oriented, well-attributed reports. As the organizers of the DRAGUN track, we describe the resources that we have newly developed to allow for the reuse of the track's tasks. The track had two tasks: (Task 1) Question Generation, producing 10 ranked investigative questions; and (Task 2, the main task) Report Generation, producing a 250-word report grounded in the MS MARCO V2.1 Segmented Corpus. As part of the track's evaluation, we had TREC assessors create importance-weighted rubrics of questions with expected short answers for 30 different news articles. These rubrics represent the information that assessors believe is important for readers to assess an article's trustworthiness. The assessors then used their rubrics to manually judge the participating teams' submitted runs. To make these tasks and their rubrics reusable, we have created an automated process to judge runs not part of the original assessing. We show that our AutoJudge ranks existing runs well compared to the TREC human-assessed evaluation (Kendall's $τ= 0.678$ for Task 1 and $τ= 0.872$ for Task 2). These resources enable both the evaluation of RAG systems for assistive news trustworthiness assessment and, with the human evaluation as a benchmark, research on improving automated RAG evaluation.
Abstract（参考訳）: 今日、多くの読者がオンラインニュースの信頼性を評価するのに苦労している。 TREC 2025 DRAGUN (Detection, Retrieval, and Augmented Generation for Understanding News) トラックは、研究者が読者のニュース信頼性評価を支援する補助的なRAGシステムを開発し評価する場を提供する。 DRAGUNトラックのオーガナイザとして、トラックのタスクの再利用を可能にするため、新たに開発したリソースについて述べる。トラックには2つのタスクがあった: (Task 1) 質問生成、ランキング10の質問生成、(Task 2のメインタスク) レポート生成、MS MARCO V2.1セグメンテッドコーパスの250ワードレポート。トラックの評価の一環として、TRECアセスタは、30の異なるニュース記事に対して、短い回答を期待して、重要度の高い質問文を作成しました。これらのルーリックは、読者が記事の信頼性を評価することが重要であると評価する情報を表している。その後、アセステーターはルーブリックを使用して、参加チームの提出したランニングを手動で判断した。これらのタスクとそのルーブリックを再利用するために、私たちは、元のアセスメントの一部ではない実行を判断する自動化プロセスを作成しました。我々は,既存のAutoJudgeランキングが,TRECによる人為評価(Kendall's $τ= 0.678$ for Task 1 および $τ= 0.872$ for Task 2)と比較して良好に動作していることを示す。これらのリソースは、ニュース信頼性評価のためのRAGシステムの評価と、人的評価をベンチマークとして、自動RAG評価の改善に関する研究の両方を可能にする。

論文の概要: Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment

関連論文リスト