ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review
- URL: http://arxiv.org/abs/2510.08867v1
- Date: Thu, 09 Oct 2025 23:53:19 GMT
- Title: ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review
- Authors: Gaurav Sahu, Hugo Larochelle, Laurent Charlin, Christopher Pal,
- Abstract summary: ReviewerToo is a framework for studying and deploying AI-assisted peer review.<n>It supports systematic experiments with specialized reviewer personas and structured evaluation criteria.<n>We show how AI can enhance consistency, coverage, and fairness while leaving complex evaluative judgments to domain experts.
- Score: 23.630458187587223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Peer review is the cornerstone of scientific publishing, yet it suffers from inconsistencies, reviewer subjectivity, and scalability challenges. We introduce ReviewerToo, a modular framework for studying and deploying AI-assisted peer review to complement human judgment with systematic and consistent assessments. ReviewerToo supports systematic experiments with specialized reviewer personas and structured evaluation criteria, and can be partially or fully integrated into real conference workflows. We validate ReviewerToo on a carefully curated dataset of 1,963 paper submissions from ICLR 2025, where our experiments with the gpt-oss-120b model achieves 81.8% accuracy for the task of categorizing a paper as accept/reject compared to 83.9% for the average human reviewer. Additionally, ReviewerToo-generated reviews are rated as higher quality than the human average by an LLM judge, though still trailing the strongest expert contributions. Our analysis highlights domains where AI reviewers excel (e.g., fact-checking, literature coverage) and where they struggle (e.g., assessing methodological novelty and theoretical contributions), underscoring the continued need for human expertise. Based on these findings, we propose guidelines for integrating AI into peer-review pipelines, showing how AI can enhance consistency, coverage, and fairness while leaving complex evaluative judgments to domain experts. Our work provides a foundation for systematic, hybrid peer-review systems that scale with the growth of scientific publishing.
Related papers
- The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research [56.80927148740585]
We address the challenges of scalability and rigor by flipping the dynamic and developing AI agents as research evaluators.<n>We use mechanistic interpretability research as a testbed, build standardized research output, and develop MechEvalAgent.<n>Our work demonstrates the potential of AI agents to transform research evaluation and pave the way for rigorous scientific practices.
arXiv Detail & Related papers (2026-02-05T19:00:02Z) - Automatic Reviewers Fail to Detect Faulty Reasoning in Research Papers: A New Counterfactual Evaluation Framework [55.078301794183496]
We focus on a core reviewing skill that underpins high-quality peer review: detecting faulty research logic.<n>This involves evaluating the internal consistency between a paper's results, interpretations, and claims.<n>We present a fully automated counterfactual evaluation framework that isolates and tests this skill under controlled conditions.
arXiv Detail & Related papers (2025-08-29T08:48:00Z) - CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z) - Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback [81.0031690510116]
We present a structured approach for automated novelty evaluation that models expert reviewer behavior through three stages.<n>Our method is informed by a large scale analysis of human written novelty reviews.<n> Evaluated on 182 ICLR 2025 submissions, the approach achieves 86.5% alignment with human reasoning and 75.3% agreement on novelty conclusions.
arXiv Detail & Related papers (2025-08-14T16:18:37Z) - The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z) - ReviewEval: An Evaluation Framework for AI-Generated Reviews [9.35023998408983]
The escalating volume of academic research, coupled with a shortage of qualified reviewers, necessitates innovative approaches to peer review.<n>We propose ReviewEval, a comprehensive evaluation framework for AI-generated reviews that measures alignment with human assessments, verifies factual accuracy, assesses analytical depth, identifies degree of constructiveness and adherence to reviewer guidelines.<n>This paper establishes essential metrics for AIbased peer review and substantially enhances the reliability and impact of AI-generated reviews in academic research.
arXiv Detail & Related papers (2025-02-17T12:22:11Z) - Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review [4.35783648216893]
Traditional closed peer review systems are slow, costly, non-transparent, and possibly subject to biases.<n>We propose and examine the efficacy and accuracy of an alternative form of scientific peer review: through an open, bottom-up process.
arXiv Detail & Related papers (2025-01-22T17:00:27Z) - The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review [49.43514488610211]
Author-provided rankings could be leveraged to improve peer review processes at machine learning conferences.<n>We focus on the Isotonic Mechanism, which calibrates raw review scores using the author-provided rankings.<n>We propose several cautious, low-risk applications of the Isotonic Mechanism and author-provided rankings in peer review.
arXiv Detail & Related papers (2024-08-24T01:51:23Z) - Unveiling the Sentinels: Assessing AI Performance in Cybersecurity Peer
Review [4.081120388114928]
In the field of cybersecurity, the practice of double-blind peer review is the de-facto standard.
This paper touches on the holy grail of peer reviewing and aims to shed light on the performance of AI in reviewing for academic security conferences.
We investigate the predictability of reviewing outcomes by comparing the results obtained from human reviewers and machine-learning models.
arXiv Detail & Related papers (2023-09-11T13:51:40Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.