A Novice-Reviewer Experiment to Address Scarcity of Qualified Reviewers
in Large Conferences
- URL: http://arxiv.org/abs/2011.15050v1
- Date: Mon, 30 Nov 2020 17:48:55 GMT
- Title: A Novice-Reviewer Experiment to Address Scarcity of Qualified Reviewers
in Large Conferences
- Authors: Ivan Stelmakh, Nihar B. Shah, Aarti Singh, and Hal Daum\'e III
- Abstract summary: A surge in the number of submissions received by leading AI conferences has challenged the sustainability of the review process.
We consider the problem of reviewer recruiting with a focus on the scarcity of qualified reviewers in large conferences.
In conjunction with ICML 2020 -- a large, top-tier machine learning conference -- we recruit a small set of reviewers through our procedure and compare their performance with the general population of ICML reviewers.
- Score: 35.24369486197371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conference peer review constitutes a human-computation process whose
importance cannot be overstated: not only it identifies the best submissions
for acceptance, but, ultimately, it impacts the future of the whole research
area by promoting some ideas and restraining others. A surge in the number of
submissions received by leading AI conferences has challenged the
sustainability of the review process by increasing the burden on the pool of
qualified reviewers which is growing at a much slower rate. In this work, we
consider the problem of reviewer recruiting with a focus on the scarcity of
qualified reviewers in large conferences. Specifically, we design a procedure
for (i) recruiting reviewers from the population not typically covered by major
conferences and (ii) guiding them through the reviewing pipeline. In
conjunction with ICML 2020 -- a large, top-tier machine learning conference --
we recruit a small set of reviewers through our procedure and compare their
performance with the general population of ICML reviewers. Our experiment
reveals that a combination of the recruiting and guiding mechanisms allows for
a principled enhancement of the reviewer pool and results in reviews of
superior quality compared to the conventional pool of reviews as evaluated by
senior members of the program committee (meta-reviewers).
Related papers
- ReviewEval: An Evaluation Framework for AI-Generated Reviews [9.35023998408983]
This research introduces a comprehensive evaluation framework for AI-generated reviews.
It measures alignment with human evaluations, verifies factual accuracy, assesses analytical depth, and identifies actionable insights.
Our framework establishes standardized metrics for evaluating AI-based review systems.
arXiv Detail & Related papers (2025-02-17T12:22:11Z) - Generative Adversarial Reviews: When LLMs Become the Critic [1.2430809884830318]
We introduce Generative Agent Reviewers (GAR), leveraging LLM-empowered agents to simulate faithful peer reviewers.
Central to this approach is a graph-based representation of manuscripts, condensing content and logically organizing information.
Our experiments demonstrate that GAR performs comparably to human reviewers in providing detailed feedback and predicting paper outcomes.
arXiv Detail & Related papers (2024-12-09T06:58:17Z) - Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning? [52.00419656272129]
We conducted an experiment during the 2023 International Conference on Machine Learning (ICML)
We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions.
We focus on the Isotonic Mechanism, which calibrates raw review scores using author-provided rankings.
arXiv Detail & Related papers (2024-08-24T01:51:23Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - Consultation Checklists: Standardising the Human Evaluation of Medical
Note Generation [58.54483567073125]
We propose a protocol that aims to increase objectivity by grounding evaluations in Consultation Checklists.
We observed good levels of inter-annotator agreement in a first evaluation study using the protocol.
arXiv Detail & Related papers (2022-11-17T10:54:28Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - A Large Scale Randomized Controlled Trial on Herding in Peer-Review
Discussions [33.261698377782075]
We aim to understand whether reviewers and more senior decision makers get disproportionately influenced by the first argument presented in a discussion.
Specifically, we design and execute a randomized controlled trial with the goal of testing for the conditional causal effect of the discussion initiator's opinion on the outcome of a paper.
arXiv Detail & Related papers (2020-11-30T18:23:07Z) - An Open Review of OpenReview: A Critical Analysis of the Machine
Learning Conference Review Process [41.049292105761246]
We critically analyze the review process through a comprehensive study of papers submitted to ICLR between 2017 and 2020.
Our findings suggest strong institutional bias in accept/reject decisions, even after controlling for paper quality.
We find evidence for a gender gap, with female authors receiving lower scores, lower acceptance rates, and fewer citations per paper than their male counterparts.
arXiv Detail & Related papers (2020-10-11T02:06:04Z) - What Can We Do to Improve Peer Review in NLP? [69.11622020605431]
We argue that a part of the problem is that the reviewers and area chairs face a poorly defined task forcing apples-to-oranges comparisons.
There are several potential ways forward, but the key difficulty is creating the incentives and mechanisms for their consistent implementation in the NLP community.
arXiv Detail & Related papers (2020-10-08T09:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.