Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS
Experiment
- URL: http://arxiv.org/abs/2109.09774v1
- Date: Mon, 20 Sep 2021 18:06:22 GMT
- Title: Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS
Experiment
- Authors: Corinna Cortes and Neil D. Lawrence
- Abstract summary: We revisit the 2014 NeurIPS experiment that examined inconsistency in conference peer review.
We find that for emphaccepted papers, there is no correlation between quality scores and impact of the paper.
- Score: 26.30237757653724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we revisit the 2014 NeurIPS experiment that examined
inconsistency in conference peer review. We determine that 50\% of the
variation in reviewer quality scores was subjective in origin. Further, with
seven years passing since the experiment we find that for \emph{accepted}
papers, there is no correlation between quality scores and impact of the paper
as measured as a function of citation count. We trace the fate of rejected
papers, recovering where these papers were eventually published. For these
papers we find a correlation between quality scores and impact. We conclude
that the reviewing process for the 2014 conference was good for identifying
poor papers, but poor for identifying good papers. We give some suggestions for
improving the reviewing process but also warn against removing the subjective
element. Finally, we suggest that the real conclusion of the experiment is that
the community should place less onus on the notion of `top-tier conference
publications' when assessing the quality of individual researchers. For NeurIPS
2021, the PCs are repeating the experiment, as well as conducting new ones.
Related papers
- CausalCite: A Causal Formulation of Paper Citations [80.82622421055734]
CausalCite is a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers.
It is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings.
We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts.
arXiv Detail & Related papers (2023-11-05T23:09:39Z) - Estimating the Causal Effect of Early ArXiving on Paper Acceptance [56.538813945721685]
We estimate the effect of arXiving a paper before the reviewing period (early arXiving) on its acceptance to the conference.
Our results suggest that early arXiving may have a small effect on a paper's chances of acceptance.
arXiv Detail & Related papers (2023-06-24T07:45:38Z) - Has the Machine Learning Review Process Become More Arbitrary as the
Field Has Grown? The NeurIPS 2021 Consistency Experiment [86.77085171670323]
We present a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process.
We observe that the two committees disagree on their accept/reject recommendations for 23% of the papers and that, consistent with the results from 2014, approximately half of the list of accepted papers would change if the review process were randomly rerun.
arXiv Detail & Related papers (2023-06-05T21:26:12Z) - Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers.
To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed.
There are no standard procedure for using this ranking information and Area Chairs may use it in different ways.
We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - Can We Automate Scientific Reviewing? [89.50052670307434]
We discuss the possibility of using state-of-the-art natural language processing (NLP) models to generate first-pass peer reviews for scientific papers.
We collect a dataset of papers in the machine learning domain, annotate them with different aspects of content covered in each review, and train targeted summarization models that take in papers to generate reviews.
Comprehensive experimental results show that system-generated reviews tend to touch upon more aspects of the paper than human-written reviews.
arXiv Detail & Related papers (2021-01-30T07:16:53Z) - An Open Review of OpenReview: A Critical Analysis of the Machine
Learning Conference Review Process [41.049292105761246]
We critically analyze the review process through a comprehensive study of papers submitted to ICLR between 2017 and 2020.
Our findings suggest strong institutional bias in accept/reject decisions, even after controlling for paper quality.
We find evidence for a gender gap, with female authors receiving lower scores, lower acceptance rates, and fewer citations per paper than their male counterparts.
arXiv Detail & Related papers (2020-10-11T02:06:04Z) - De-anonymization of authors through arXiv submissions during
double-blind review [33.15282901539867]
We investigate the effects of releasing arXiv preprints of papers undergoing a double-blind review process.
We find statistically significant evidence of positive correlation between percentage acceptance and papers with high reputation released on arXiv.
arXiv Detail & Related papers (2020-07-01T01:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.