A Novice-Reviewer Experiment to Address Scarcity of Qualified Reviewers
in Large Conferences
- URL: http://arxiv.org/abs/2011.15050v1
- Date: Mon, 30 Nov 2020 17:48:55 GMT
- Title: A Novice-Reviewer Experiment to Address Scarcity of Qualified Reviewers
in Large Conferences
- Authors: Ivan Stelmakh, Nihar B. Shah, Aarti Singh, and Hal Daum\'e III
- Abstract summary: A surge in the number of submissions received by leading AI conferences has challenged the sustainability of the review process.
We consider the problem of reviewer recruiting with a focus on the scarcity of qualified reviewers in large conferences.
In conjunction with ICML 2020 -- a large, top-tier machine learning conference -- we recruit a small set of reviewers through our procedure and compare their performance with the general population of ICML reviewers.
- Score: 35.24369486197371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conference peer review constitutes a human-computation process whose
importance cannot be overstated: not only it identifies the best submissions
for acceptance, but, ultimately, it impacts the future of the whole research
area by promoting some ideas and restraining others. A surge in the number of
submissions received by leading AI conferences has challenged the
sustainability of the review process by increasing the burden on the pool of
qualified reviewers which is growing at a much slower rate. In this work, we
consider the problem of reviewer recruiting with a focus on the scarcity of
qualified reviewers in large conferences. Specifically, we design a procedure
for (i) recruiting reviewers from the population not typically covered by major
conferences and (ii) guiding them through the reviewing pipeline. In
conjunction with ICML 2020 -- a large, top-tier machine learning conference --
we recruit a small set of reviewers through our procedure and compare their
performance with the general population of ICML reviewers. Our experiment
reveals that a combination of the recruiting and guiding mechanisms allows for
a principled enhancement of the reviewer pool and results in reviews of
superior quality compared to the conventional pool of reviews as evaluated by
senior members of the program committee (meta-reviewers).
Related papers
- Group Fairness in Peer Review [44.580732477017904]
This paper introduces a notion of group fairness, called the core, which requires that every possible community (subset of researchers) be treated in a way that prevents them from unilaterally benefiting from withdrawing from a large conference.
We study a simple peer review model, prove that it always admits a reviewing assignment in the core, and design an efficient algorithm to find one such assignment.
arXiv Detail & Related papers (2024-10-04T14:48:10Z) - Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning? [52.00419656272129]
We conducted an experiment during the 2023 International Conference on Machine Learning (ICML)
We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions.
We focus on the Isotonic Mechanism, which calibrates raw review scores using author-provided rankings.
arXiv Detail & Related papers (2024-08-24T01:51:23Z) - Automatic Analysis of Substantiation in Scientific Peer Reviews [24.422667012858298]
SubstanReview consists of 550 reviews from NLP conferences annotated by domain experts.
On the basis of this dataset, we train an argument mining system to automatically analyze the level of substantiation in peer reviews.
We also perform data analysis on the SubstanReview dataset to obtain meaningful insights on peer reviewing quality in NLP conferences over recent years.
arXiv Detail & Related papers (2023-11-20T17:47:37Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - Consultation Checklists: Standardising the Human Evaluation of Medical
Note Generation [58.54483567073125]
We propose a protocol that aims to increase objectivity by grounding evaluations in Consultation Checklists.
We observed good levels of inter-annotator agreement in a first evaluation study using the protocol.
arXiv Detail & Related papers (2022-11-17T10:54:28Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - A Large Scale Randomized Controlled Trial on Herding in Peer-Review
Discussions [33.261698377782075]
We aim to understand whether reviewers and more senior decision makers get disproportionately influenced by the first argument presented in a discussion.
Specifically, we design and execute a randomized controlled trial with the goal of testing for the conditional causal effect of the discussion initiator's opinion on the outcome of a paper.
arXiv Detail & Related papers (2020-11-30T18:23:07Z) - An Open Review of OpenReview: A Critical Analysis of the Machine
Learning Conference Review Process [41.049292105761246]
We critically analyze the review process through a comprehensive study of papers submitted to ICLR between 2017 and 2020.
Our findings suggest strong institutional bias in accept/reject decisions, even after controlling for paper quality.
We find evidence for a gender gap, with female authors receiving lower scores, lower acceptance rates, and fewer citations per paper than their male counterparts.
arXiv Detail & Related papers (2020-10-11T02:06:04Z) - What Can We Do to Improve Peer Review in NLP? [69.11622020605431]
We argue that a part of the problem is that the reviewers and area chairs face a poorly defined task forcing apples-to-oranges comparisons.
There are several potential ways forward, but the key difficulty is creating the incentives and mechanisms for their consistent implementation in the NLP community.
arXiv Detail & Related papers (2020-10-08T09:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.