Is Peer Review Really in Decline? Analyzing Review Quality across Venues and Time
- URL: http://arxiv.org/abs/2601.15172v1
- Date: Wed, 21 Jan 2026 16:48:29 GMT
- Title: Is Peer Review Really in Decline? Analyzing Review Quality across Venues and Time
- Authors: Ilia Kuznetsov, Rohan Nayak, Alla Rozovskaya, Iryna Gurevych,
- Abstract summary: We introduce a new framework for evidence-based comparative study of review quality.<n>We apply it to major AI and machine learning conferences: ICLR, NeurIPS and *ACL.<n>We study the relationships between measurements of review quality, and its evolution over time.
- Score: 55.756345497678204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Peer review is at the heart of modern science. As submission numbers rise and research communities grow, the decline in review quality is a popular narrative and a common concern. Yet, is it true? Review quality is difficult to measure, and the ongoing evolution of reviewing practices makes it hard to compare reviews across venues and time. To address this, we introduce a new framework for evidence-based comparative study of review quality and apply it to major AI and machine learning conferences: ICLR, NeurIPS and *ACL. We document the diversity of review formats and introduce a new approach to review standardization. We propose a multi-dimensional schema for quantifying review quality as utility to editors and authors, coupled with both LLM-based and lightweight measurements. We study the relationships between measurements of review quality, and its evolution over time. Contradicting the popular narrative, our cross-temporal analysis reveals no consistent decline in median review quality across venues and years. We propose alternative explanations, and outline recommendations to facilitate future empirical studies of review quality.
Related papers
- What Drives Paper Acceptance? A Process-Centric Analysis of Modern Peer Review [2.9282248958475345]
We present a large-scale empirical study of ICLR 2017-2025, encompassing over 28,000 submissions.<n>Our results show that factors beyond scientific novelty significantly shape acceptance outcomes.<n>We propose data-driven guidelines for authors, reviewers, and meta-reviewers to enhance transparency and fairness in peer review.
arXiv Detail & Related papers (2025-09-30T03:00:10Z) - ReviewScore: Misinformed Peer Review Detection with Large Language Models [38.92827930465428]
We show that 15.2% of weaknesses and 26.4% of questions are misinformed and introduce ReviewScore indicating if a review point is misinformed.<n>We build a human expert-annotated ReviewScore dataset to check the ability of LLMs to automate ReviewScore evaluation.<n>We also prove that evaluating premise-level factuality shows significantly higher agreements than evaluating weakness-level factuality.
arXiv Detail & Related papers (2025-09-25T22:55:05Z) - Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback [81.0031690510116]
We present a structured approach for automated novelty evaluation that models expert reviewer behavior through three stages.<n>Our method is informed by a large scale analysis of human written novelty reviews.<n> Evaluated on 182 ICLR 2025 submissions, the approach achieves 86.5% alignment with human reasoning and 75.3% agreement on novelty conclusions.
arXiv Detail & Related papers (2025-08-14T16:18:37Z) - LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories.<n>Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting.<n> instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z) - Identifying Aspects in Peer Reviews [59.02879434536289]
We develop a data-driven schema for deriving aspects from a corpus of peer reviews.<n>We introduce a dataset of peer reviews augmented with aspects and show how it can be used for community-level review analysis.
arXiv Detail & Related papers (2025-04-09T14:14:42Z) - Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review [4.35783648216893]
Traditional closed peer review systems are slow, costly, non-transparent, and possibly subject to biases.<n>We propose and examine the efficacy and accuracy of an alternative form of scientific peer review: through an open, bottom-up process.
arXiv Detail & Related papers (2025-01-22T17:00:27Z) - AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews [18.50142644126276]
We evaluate the alignment of automatic paper reviews with human reviews using an arena of human preferences by pairwise comparisons.
We fine-tune an LLM to predict human preferences, predicting which reviews humans will prefer in a head-to-head battle between LLMs.
We make the reviews of publicly available arXiv and open-access Nature journal papers available online, along with a free service which helps authors review and revise their research papers and improve their quality.
arXiv Detail & Related papers (2024-08-19T19:10:38Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [51.26815896167173]
We present a comprehensive tertiary analysis of PAMI reviews along three complementary dimensions.<n>Our analyses reveal distinctive organizational patterns as well as persistent gaps in current review practices.<n>Finally, our evaluation of state-of-the-art AI-generated reviews indicates encouraging advances in coherence and organization.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.