Fugu-MT 論文翻訳(概要): LLM-REVal: Can We Trust LLM Reviewers Yet?

論文の概要: LLM-REVal: Can We Trust LLM Reviewers Yet?

arxiv url: http://arxiv.org/abs/2510.12367v1
Date: Tue, 14 Oct 2025 10:30:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-15 19:02:32.280098
Title: LLM-REVal: Can We Trust LLM Reviewers Yet?
Title（参考訳）: LLM-ReVal: LLMレビュアーを信頼できますか?
Authors: Rui Li, Jia-Chen Gu, Po-Nien Kung, Heming Xia, Junfeng liu, Xiangwen Kong, Zhifang Sui, Nanyun Peng,
Abstract要約: 大規模言語モデル(LLM)は研究者に、学術的なワークフローに広く組み込むよう刺激を与えている。本研究は、LLMのピアレビューと研究プロセスへの深い統合が学術的公正性にどのように影響するかに焦点を当てる。
参考スコア（独自算出の注目度）: 70.58742663985652
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The rapid advancement of large language models (LLMs) has inspired researchers to integrate them extensively into the academic workflow, potentially reshaping how research is practiced and reviewed. While previous studies highlight the potential of LLMs in supporting research and peer review, their dual roles in the academic workflow and the complex interplay between research and review bring new risks that remain largely underexplored. In this study, we focus on how the deep integration of LLMs into both peer-review and research processes may influence scholarly fairness, examining the potential risks of using LLMs as reviewers by simulation. This simulation incorporates a research agent, which generates papers and revises, alongside a review agent, which assesses the submissions. Based on the simulation results, we conduct human annotations and identify pronounced misalignment between LLM-based reviews and human judgments: (1) LLM reviewers systematically inflate scores for LLM-authored papers, assigning them markedly higher scores than human-authored ones; (2) LLM reviewers persistently underrate human-authored papers with critical statements (e.g., risk, fairness), even after multiple revisions. Our analysis reveals that these stem from two primary biases in LLM reviewers: a linguistic feature bias favoring LLM-generated writing styles, and an aversion toward critical statements. These results highlight the risks and equity concerns posed to human authors and academic research if LLMs are deployed in the peer review cycle without adequate caution. On the other hand, revisions guided by LLM reviews yield quality gains in both LLM-based and human evaluations, illustrating the potential of the LLMs-as-reviewers for early-stage researchers and enhancing low-quality papers.
Abstract（参考訳）: 大規模言語モデル(LLM)の急速な進歩は、研究者に学術的ワークフローへの統合を促し、研究の実践とレビューの方法を変える可能性がある。これまでの研究では、LLMsが研究とピアレビューを支援する可能性を強調していたが、学術的ワークフローにおける彼らの二重の役割と、研究とレビューの間の複雑な相互作用は、主に過小評価されている新しいリスクをもたらす。本研究では,LLMのピアレビューと研究プロセスへの深い統合が学術的公正性にどのように影響するかに注目し,シミュレーションによりLLMをレビュアーとして使用する可能性について検討する。このシミュレーションには、論文を生成し、修正する研究エージェントと、提出した論文を評価するレビューエージェントが組み込まれている。シミュレーションの結果から, LLM によるレビューと人的判断の相違点を, 1) LLM による論文のスコアを体系的にインフレーションし,人的承認論文よりも顕著に高いスコアを付与する; (2) LLM のレビュアーは,批判的な文章(リスク,公正性など)を連続的に評価する。分析の結果、これらはLLMレビュアーの2つの主要なバイアス、すなわち言語的特徴バイアスがLLM生成の書体スタイルに有利であること、および批判的文に対する嫌悪に起因していることが判明した。これらの結果は、LLMが適切な注意を払わずにピアレビューサイクルに展開される場合、人間の著者や学術研究に生じるリスクと株式の懸念を浮き彫りにしている。一方, LLM レビューによる改訂は, LLM ベースの評価と人間による評価の両方において品質向上をもたらし, 早期研究者に対する LLM ・アズ・リビューアの可能性と低品質論文の充実を図っている。

論文の概要: LLM-REVal: Can We Trust LLM Reviewers Yet?

関連論文リスト