Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review
- URL: http://arxiv.org/abs/2501.13014v1
- Date: Wed, 22 Jan 2025 17:00:27 GMT
- Title: Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review
- Authors: Andrii Zahorodnii, Jasper J. F. van den Bosch, Ian Charest, Christopher Summerfield, Ila R. Fiete,
- Abstract summary: This study proposes a data-driven framework for enhancing the accuracy and efficiency of scientific peer review through an open, bottom-up process that estimates reviewer quality.
We analyze open peer review data from two major scientific conferences, and demonstrate that reviewer-specific quality scores significantly improve the reliability of paper quality estimation.
- Score: 3.802113616844045
- License:
- Abstract: This study proposes a data-driven framework for enhancing the accuracy and efficiency of scientific peer review through an open, bottom-up process that estimates reviewer quality. Traditional closed peer review systems, while essential for quality control, are often slow, costly, and subject to biases that can impede scientific progress. Here, we introduce a method that evaluates individual reviewer reliability by quantifying agreement with community consensus scores and applying Bayesian weighting to refine paper quality assessments. We analyze open peer review data from two major scientific conferences, and demonstrate that reviewer-specific quality scores significantly improve the reliability of paper quality estimation. Perhaps surprisingly, we find that reviewer quality scores are unrelated to authorship quality. Our model incorporates incentive structures to recognize high-quality reviewers and encourage broader coverage of submitted papers, thereby mitigating the common "rich-get-richer" pitfall of social media. These findings suggest that open peer review, with mechanisms for estimating and incentivizing reviewer quality, offers a scalable and equitable alternative for scientific publishing, with potential to enhance the speed, fairness, and transparency of the peer review process.
Related papers
- exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem [11.763640675057076]
We develop a benchmark dataset for evaluating the reviewer assignment problem without needing explicit labels.
We benchmark various methods, including traditional lexical matching, static neural embeddings, and contextualized neural embeddings.
Our results indicate that while traditional methods perform reasonably well, contextualized embeddings trained on scholarly literature show the best performance.
arXiv Detail & Related papers (2025-02-11T16:35:04Z) - Generative Adversarial Reviews: When LLMs Become the Critic [1.2430809884830318]
We introduce Generative Agent Reviewers (GAR), leveraging LLM-empowered agents to simulate faithful peer reviewers.
Central to this approach is a graph-based representation of manuscripts, condensing content and logically organizing information.
Our experiments demonstrate that GAR performs comparably to human reviewers in providing detailed feedback and predicting paper outcomes.
arXiv Detail & Related papers (2024-12-09T06:58:17Z) - Multi-Facet Counterfactual Learning for Content Quality Evaluation [48.73583736357489]
We propose a framework for efficiently constructing evaluators that perceive multiple facets of content quality evaluation.
We leverage a joint training strategy based on contrastive learning and supervised learning to enable the evaluator to distinguish between different quality facets.
arXiv Detail & Related papers (2024-10-10T08:04:10Z) - Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning? [52.00419656272129]
We conducted an experiment during the 2023 International Conference on Machine Learning (ICML)
We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions.
We focus on the Isotonic Mechanism, which calibrates raw review scores using author-provided rankings.
arXiv Detail & Related papers (2024-08-24T01:51:23Z) - Eliciting Honest Information From Authors Using Sequential Review [13.424398627546788]
We propose a sequential review mechanism that can truthfully elicit the ranking information from authors.
The key idea is to review the papers of an author in a sequence based on the provided ranking and conditioning the review of the next paper on the review scores of the previous papers.
arXiv Detail & Related papers (2023-11-24T17:27:39Z) - How do Authors' Perceptions of their Papers Compare with Co-authors'
Perceptions and Peer-review Decisions? [87.00095008723181]
Authors have roughly a three-fold overestimate of the acceptance probability of their papers.
Female authors exhibit a marginally higher (statistically significant) miscalibration than male authors.
At least 30% of respondents of both accepted and rejected papers said that their perception of their own paper improved after the review process.
arXiv Detail & Related papers (2022-11-22T15:59:30Z) - Consultation Checklists: Standardising the Human Evaluation of Medical
Note Generation [58.54483567073125]
We propose a protocol that aims to increase objectivity by grounding evaluations in Consultation Checklists.
We observed good levels of inter-annotator agreement in a first evaluation study using the protocol.
arXiv Detail & Related papers (2022-11-17T10:54:28Z) - Investigating Crowdsourcing Protocols for Evaluating the Factual
Consistency of Summaries [59.27273928454995]
Current pre-trained models applied to summarization are prone to factual inconsistencies which misrepresent the source text or introduce extraneous information.
We create a crowdsourcing evaluation framework for factual consistency using the rating-based Likert scale and ranking-based Best-Worst Scaling protocols.
We find that ranking-based protocols offer a more reliable measure of summary quality across datasets, while the reliability of Likert ratings depends on the target dataset and the evaluation design.
arXiv Detail & Related papers (2021-09-19T19:05:00Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - Debiasing Evaluations That are Biased by Evaluations [32.135315382120154]
We consider the problem of mitigating outcome-induced biases in ratings when some information about the outcome is available.
We propose a debiasing method by solving a regularized optimization problem under this ordering constraint.
We also provide a carefully designed cross-validation method that adaptively chooses the appropriate amount of regularization.
arXiv Detail & Related papers (2020-12-01T18:20:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.