Fugu-MT 論文翻訳(概要): FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

論文の概要: FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

arxiv url: http://arxiv.org/abs/2604.04074v2
Date: Tue, 07 Apr 2026 17:20:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 15:04:55.552732
Title: FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification
Title（参考訳）: FactReview:Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification
Authors: Hang Xu, Ling Yue, Chaoqian Ouyang, Yuchen Liu, Libin Zheng, Shaowu Pan, Shimin Di, Min-Ling Zhang,
Abstract要約: 本稿では,クレーム抽出,文献位置決定,実行に基づくクレーム検証を組み合わせたエビデンスベースレビューシステムであるFactReviewを紹介する。 FactReviewは論文を提出すると、主要なクレームを特定し、その結果を報告し、論文の技術的な位置を明らかにするために近くの作業を取り出し、コードが利用可能であれば、リリースされたリポジトリを実行する。その後、簡潔なレビューと、主要な請求を5つのラベルのうち1つに割り当てるエビデンスレポートを生成する。
参考スコア（独自算出の注目度）: 57.196748998757954
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Peer review in machine learning is under growing pressure from rising submission volume and limited reviewer time. Most LLM-based reviewing systems read only the manuscript and generate comments from the paper's own narrative. This makes their outputs sensitive to presentation quality and leaves them weak when the evidence needed for review lies in related work or released code. We present FactReview, an evidence-grounded reviewing system that combines claim extraction, literature positioning, and execution-based claim verification. Given a submission, FactReview identifies major claims and reported results, retrieves nearby work to clarify the paper's technical position, and, when code is available, executes the released repository under bounded budgets to test central empirical claims. It then produces a concise review and an evidence report that assigns each major claim one of five labels: Supported, Supported by the paper, Partially supported, In conflict, or Inconclusive. In a case study on CompGCN, FactReview reproduces results that closely match those reported for link prediction and node classification, yet also shows that the paper's broader performance claim across tasks is not fully sustained: on MUTAG graph classification, the reproduced result is 88.4%, whereas the strongest baseline reported in the paper remains 92.6%. The claim is therefore only partially supported. More broadly, this case suggests that AI is most useful in peer review not as a final decision-maker, but as a tool for gathering evidence and helping reviewers produce more evidence-grounded assessments. The code is public at https://github.com/DEFENSE-SEU/Review-Assistant.
Abstract（参考訳）: 機械学習におけるピアレビューは、提出量の増加とレビュアー時間の制限によるプレッシャーが増大している。 LLMベースのレビューシステムのほとんどは、原稿のみを読み、論文自身の物語からコメントを生成する。これにより、アウトプットはプレゼンテーションの品質に敏感になり、レビューに必要な証拠が関連する作業やリリースコードにある場合に弱くなる。本稿では,クレーム抽出,文献位置決定,実行に基づくクレーム検証を組み合わせたエビデンスベースレビューシステムであるFactReviewを紹介する。 FactReviewは、提出された投稿によって、主要なクレームを特定し、その結果を報告し、論文の技術的な位置を明らかにするために近くの作業を取り出し、コードが利用可能になったら、中央実証的なクレームをテストするために、制限付き予算の下でリリースされたリポジトリを実行する。その後、簡潔なレビューと、主要な請求を5つのラベルのうち1つに割り当てるエビデンスレポートを生成する。 CompGCN のケーススタディでは、FactReview はリンク予測やノード分類と密接に一致した結果を再現しているが、MUTAG グラフ分類では、再現結果は 88.4% であり、論文で報告された最強のベースラインは92.6% である。そのため、この主張は部分的にしか支持されていない。より広い範囲において、このケースは、AIが最終的な意思決定者ではなく、証拠を集め、レビュアーがより証拠に基づいて評価を行うのを助けるツールとしてピアレビューにおいて最も有用であることを示唆している。コードはhttps://github.com/DEFENSE-SEU/Review-Assistantで公開されている。

論文の概要: FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

関連論文リスト