Related papers: Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review

Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review

URL: http://arxiv.org/abs/2501.13014v2
Date: Thu, 02 Oct 2025 00:57:24 GMT
Title: Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review
Authors: Andrii Zahorodnii, Jasper J. F. van den Bosch, Ian Charest, Christopher Summerfield, Ila R. Fiete,
Abstract summary: Traditional closed peer review systems are slow, costly, non-transparent, and possibly subject to biases.<n>We propose and examine the efficacy and accuracy of an alternative form of scientific peer review: through an open, bottom-up process.
Score: 4.35783648216893
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditional closed peer review systems, which have played a central role in scientific publishing, are often slow, costly, non-transparent, stochastic, and possibly subject to biases - factors that can impede scientific progress and undermine public trust. Here, we propose and examine the efficacy and accuracy of an alternative form of scientific peer review: through an open, bottom-up process. First, using data from two major scientific conferences (CCN2023 and ICLR2023), we highlight how high variability of review scores and low correlation across reviewers presents a challenge for collective review. We quantify reviewer agreement with community consensus scores and use this as a reviewer quality estimator, showing that surprisingly, reviewer quality scores are not correlated with authorship quality. Instead, we reveal an inverted U-shape relationship, where authors with intermediate paper scores are the best reviewers. We assess empirical Bayesian methods to estimate paper quality based on different assessments of individual reviewer reliability. We show how under a one-shot review-then-score scenario, both in our models and on real peer review data, a Bayesian measure significantly improves paper quality assessments relative to simple averaging. We then consider an ongoing model of publishing, reviewing, and scoring, with reviewers scoring not only papers but also other reviewers. We show that user-generated reviewer ratings can yield robust and high-quality paper scoring even when unreliable (but unbiased) reviewers dominate. Finally, we outline incentive structures to recognize high-quality reviewers and encourage broader reviewing coverage of submitted papers. These findings suggest that a self-selecting open peer review process is potentially scalable, reliable, and equitable with the possibility of enhancing the speed, fairness, and transparency of the peer review process.

Related papers

Is Peer Review Really in Decline? Analyzing Review Quality across Venues and Time [55.756345497678204]
We introduce a new framework for evidence-based comparative study of review quality.<n>We apply it to major AI and machine learning conferences: ICLR, NeurIPS and *ACL.<n>We study the relationships between measurements of review quality, and its evolution over time.
arXiv Detail & Related papers (2026-01-21T16:48:29Z)
What Drives Paper Acceptance? A Process-Centric Analysis of Modern Peer Review [2.9282248958475345]
We present a large-scale empirical study of ICLR 2017-2025, encompassing over 28,000 submissions.<n>Our results show that factors beyond scientific novelty significantly shape acceptance outcomes.<n>We propose data-driven guidelines for authors, reviewers, and meta-reviewers to enhance transparency and fairness in peer review.
arXiv Detail & Related papers (2025-09-30T03:00:10Z)
Automatic Reviewers Fail to Detect Faulty Reasoning in Research Papers: A New Counterfactual Evaluation Framework [55.078301794183496]
We focus on a core reviewing skill that underpins high-quality peer review: detecting faulty research logic.<n>This involves evaluating the internal consistency between a paper's results, interpretations, and claims.<n>We present a fully automated counterfactual evaluation framework that isolates and tests this skill under controlled conditions.
arXiv Detail & Related papers (2025-08-29T08:48:00Z)
OpenReview Should be Protected and Leveraged as a Community Asset for Research in the Era of Large Language Models [55.21589313404023]
OpenReview is a continually evolving repository of research papers, peer reviews, author rebuttals, meta-reviews, and decision outcomes.<n>We highlight three promising areas in which OpenReview can uniquely contribute: enhancing the quality, scalability, and accountability of peer review processes; enabling meaningful, open-ended benchmarks rooted in genuine expert deliberation; and supporting alignment research through real-world interactions reflecting expert assessment, intentions, and scientific values.<n>We suggest the community collaboratively explore standardized benchmarks and usage guidelines around OpenReview, inviting broader dialogue on responsible data use, ethical considerations, and collective stewardship.
arXiv Detail & Related papers (2025-05-24T09:07:13Z)
Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards [2.8239108914343305]
This paper argues for the need to transform the traditional one-way review system into a bi-directional feedback loop.<n>Authors evaluate review quality and reviewers earn formal accreditation, creating an accountability framework.
arXiv Detail & Related papers (2025-05-08T05:51:48Z)
Identifying Aspects in Peer Reviews [61.374437855024844]
We develop a data-driven schema for deriving fine-grained aspects from a corpus of peer reviews. We introduce a dataset of peer reviews augmented with aspects and show how it can be used for community-level review analysis.
arXiv Detail & Related papers (2025-04-09T14:14:42Z)
exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem [11.763640675057076]
We develop a benchmark dataset for evaluating the reviewer assignment problem without needing explicit labels. We benchmark various methods, including traditional lexical matching, static neural embeddings, and contextualized neural embeddings. Our results indicate that while traditional methods perform reasonably well, contextualized embeddings trained on scholarly literature show the best performance.
arXiv Detail & Related papers (2025-02-11T16:35:04Z)
Generative Adversarial Reviews: When LLMs Become the Critic [1.2430809884830318]
We introduce Generative Agent Reviewers (GAR), leveraging LLM-empowered agents to simulate faithful peer reviewers.<n>Central to this approach is a graph-based representation of manuscripts, condensing content and logically organizing information.<n>Our experiments demonstrate that GAR performs comparably to human reviewers in providing detailed feedback and predicting paper outcomes.
arXiv Detail & Related papers (2024-12-09T06:58:17Z)
Multi-Facet Counterfactual Learning for Content Quality Evaluation [48.73583736357489]
We propose a framework for efficiently constructing evaluators that perceive multiple facets of content quality evaluation. We leverage a joint training strategy based on contrastive learning and supervised learning to enable the evaluator to distinguish between different quality facets.
arXiv Detail & Related papers (2024-10-10T08:04:10Z)
Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning? [52.00419656272129]
We conducted an experiment during the 2023 International Conference on Machine Learning (ICML) We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions. We focus on the Isotonic Mechanism, which calibrates raw review scores using author-provided rankings.
arXiv Detail & Related papers (2024-08-24T01:51:23Z)
Analytical and Empirical Study of Herding Effects in Recommendation Systems [72.6693986712978]
We study how to manage product ratings via rating aggregation rules and shortlisted representative reviews. We show that proper recency aware rating aggregation rules can improve the speed of convergence in Amazon and TripAdvisor.
arXiv Detail & Related papers (2024-08-20T14:29:23Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [55.33653554387953]
Pattern Analysis and Machine Intelligence (PAMI) has led to numerous literature reviews aimed at collecting and fragmented information.<n>This paper presents a thorough analysis of these literature reviews within the PAMI field.<n>We try to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews; (2) What strategies can researchers employ to efficiently navigate the growing corpus of reviews; and (3) What are the advantages and limitations of AI-generated reviews compared to human-authored ones.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
Eliciting Honest Information From Authors Using Sequential Review [13.424398627546788]
We propose a sequential review mechanism that can truthfully elicit the ranking information from authors. The key idea is to review the papers of an author in a sequence based on the provided ranking and conditioning the review of the next paper on the review scores of the previous papers.
arXiv Detail & Related papers (2023-11-24T17:27:39Z)
When Reviewers Lock Horn: Finding Disagreement in Scientific Peer Reviews [24.875901048855077]
We introduce a novel task of automatically identifying contradictions among reviewers on a given article. To the best of our knowledge, we make the first attempt to identify disagreements among peer reviewers automatically.
arXiv Detail & Related papers (2023-10-28T11:57:51Z)
Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation [58.54483567073125]
We propose a protocol that aims to increase objectivity by grounding evaluations in Consultation Checklists. We observed good levels of inter-annotator agreement in a first evaluation study using the protocol.
arXiv Detail & Related papers (2022-11-17T10:54:28Z)
Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries [59.27273928454995]
Current pre-trained models applied to summarization are prone to factual inconsistencies which misrepresent the source text or introduce extraneous information. We create a crowdsourcing evaluation framework for factual consistency using the rating-based Likert scale and ranking-based Best-Worst Scaling protocols. We find that ranking-based protocols offer a more reliable measure of summary quality across datasets, while the reliability of Likert ratings depends on the target dataset and the evaluation design.
arXiv Detail & Related papers (2021-09-19T19:05:00Z)
Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores. We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z)
Making Paper Reviewing Robust to Bid Manipulation Attacks [44.34601846490532]
Anecdotal evidence suggests that some reviewers bid on papers by "friends" or colluding authors. We develop a novel approach for paper bidding and assignment that is much more robust against such attacks. In addition to being more robust, the quality of our paper review assignments is comparable to that of current, non-robust assignment approaches.
arXiv Detail & Related papers (2021-02-09T21:24:16Z)
Debiasing Evaluations That are Biased by Evaluations [32.135315382120154]
We consider the problem of mitigating outcome-induced biases in ratings when some information about the outcome is available. We propose a debiasing method by solving a regularized optimization problem under this ordering constraint. We also provide a carefully designed cross-validation method that adaptively chooses the appropriate amount of regularization.
arXiv Detail & Related papers (2020-12-01T18:20:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.