Fugu-MT 論文翻訳(概要): ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

論文の概要: ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

arxiv url: http://arxiv.org/abs/2604.14261v1
Date: Wed, 15 Apr 2026 16:33:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 21:29:29.9538
Title: ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents
Title（参考訳）: ReviewGrounder: Rubric-Guided, Tool-Integrated Agentsによるレビュー実体性の向上
Authors: Zhuofeng Li, Yi Lu, Dongfu Jiang, Haoxiang Zhang, Yuyang Bai, Chuan Li, Yu Wang, Shuiwang Ji, Jianwen Xie, Yu Zhang,
Abstract要約: 公式ガイドライン、論文の内容、人間によるレビューから派生した、紙固有のルーリックに従ってテキストをレビューする。本稿では、公式ガイドライン、論文の内容、人手によるレビューに基づいて、レビューテキストを評価するベンチマークであるREVIEWBENCHを紹介する。本稿では,レビューを起草段階と接地段階に分解するルーリック誘導ツール統合マルチエージェントフレームワークであるREVIEWGROUNDERを提案する。
参考スコア（独自算出の注目度）: 50.27474750319121
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid rise in AI conference submissions has driven increasing exploration of large language models (LLMs) for peer review support. However, LLM-based reviewers often generate superficial, formulaic comments lacking substantive, evidence-grounded feedback. We attribute this to the underutilization of two key components of human reviewing: explicit rubrics and contextual grounding in existing work. To address this, we introduce REVIEWBENCH, a benchmark evaluating review text according to paper-specific rubrics derived from official guidelines, the paper's content, and human-written reviews. We further propose REVIEWGROUNDER, a rubric-guided, tool-integrated multi-agent framework that decomposes reviewing into drafting and grounding stages, enriching shallow drafts via targeted evidence consolidation. Experiments on REVIEWBENCH show that REVIEWGROUNDER, using a Phi-4-14B-based drafter and a GPT-OSS-120B-based grounding stage, consistently outperforms baselines with substantially stronger/larger backbones (e.g., GPT-4.1 and DeepSeek-R1-670B) in both alignment with human judgments and rubric-based review quality across 8 dimensions. The code is available \href{https://github.com/EigenTom/ReviewGrounder}{here}.
Abstract（参考訳）: AIカンファレンスの提出が急速に増えているため、ピアレビューのサポートのため、大規模言語モデル(LLM)の探索が増加している。しかし、LCMベースのレビュアーは、実質的で証拠に基づくフィードバックに欠ける表面的で公式なコメントをしばしば生成する。これは、人間のレビューにおける2つの重要な要素である明示的なルーリックと、既存の作業における文脈的根拠の未活用によるものである。そこで本稿では,公式ガイドライン,論文の内容,人手によるレビューに基づいて,レビューテキストを評価するベンチマークであるREVIEWBENCHを紹介する。さらに,レビュワーを起草段階と接地段階に分解し,対象とするエビデンスを集約して浅いドラフトを充実させる,ルーリック誘導・ツール統合多エージェントフレームワークであるREVIEWGROUNDERを提案する。 REVIEWBENCHの実験によると、REVIEWGROUNDERはPhi-4-14BベースのドラフトとGPT-OSS-120Bベースのグラウンドを使用しており、人間の判断と8次元にわたるルーリックベースのレビュー品質の両面において、非常に強い/大きいバックボーン(例えば、GPT-4.1とDeepSeek-R1-670B)でベースラインを上回っている。コードは \href{https://github.com/EigenTom/ReviewGrounder}{here} で入手できる。

論文の概要: ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

関連論文リスト