Fugu-MT 論文翻訳(概要): LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers

論文の概要: LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers

arxiv url: http://arxiv.org/abs/2605.25415v1
Date: Mon, 25 May 2026 04:32:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:19.285909
Title: LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers
Title（参考訳）: LLM-as-a-Reviewer: ペーパーレビュアーとしての能力, 多様性, プロンプトインジェクション抵抗のベンチマーク
Authors: Lingyao Li, Junjie Xiong, Changjia Zhu, Runlong Yu, Chen Chen, Junyu Wang, Renkai Ma, Zhicong Lu,
Abstract要約: 大規模言語モデル(LLM)は、学術的ピアレビューでますます使われている。 898 論文に LLM-as-a-Reviewer のベンチマークを示す。
参考スコア（独自算出の注目度）: 42.116161679682236
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly used in academic peer review, yet their reliability, alignment with human judgment, and robustness to adversarial attacks remain poorly understood. We present a systematic benchmark of LLM-as-a-Reviewer on 898 papers stratified from NeurIPS and ICLR, evaluating 12 LLMs along three axes: rating calibration, divergence from human reviewers, and resistance to prompt injection embedded via an invisible font-mapping attack. We find that LLMs systematically overrate weaker submissions and diverge from humans in topical emphasis, under-flagging Clarity and over-flagging Reproducibility, while producing reviews two to three times longer with lower lexical diversity and a more standardized vocabulary. Prompt injection remains highly effective. Simple hidden instructions can promote low-scoring papers to acceptance-level ratings in a substantial fraction of cases, with effectiveness varying sharply across model families. While LLMs offer utility in structuring evaluations, their integration into peer review requires safeguards against both intrinsic biases and adversarial risks.
Abstract（参考訳）: 大規模言語モデル (LLM) は、学術的ピアレビューにおいてますます使われているが、信頼性、人的判断との整合性、敵の攻撃に対する堅牢性はよく分かっていない。我々は,NeurIPS と ICLR から成層化した 898 枚の LLM-as-a-Reviewer の系統的ベンチマークを行い,評価キャリブレーション,人間レビュアーからの偏差,目に見えないフォント・マッピング・アタックによるインジェクションへの抵抗の3軸に沿って 12 個の LLM の評価を行った。また,LLMは,低語彙の多様性とより標準化された語彙で2～3倍のレビューを作成しながら,局所的強調,過度な明瞭度,過度な再現性で,体系的に弱い提案を過大評価し,人間から逸脱することを見出した。プロンプト注射は依然として有効である。単純な隠蔽命令は、モデルファミリ間で効果が急激に変化し、比較的少数のケースで、ロースコアの論文を受入レベルの評価に昇格させる。 LLMは評価の構造化に有用であるが、ピアレビューへの統合には本質的バイアスと敵対的リスクの両方に対する保護が必要である。

論文の概要: LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers

関連論文リスト