Analyzing the Machine Learning Conference Review Process
- URL: http://arxiv.org/abs/2011.12919v2
- Date: Thu, 26 Nov 2020 01:34:24 GMT
- Title: Analyzing the Machine Learning Conference Review Process
- Authors: David Tran, Alex Valtchanov, Keshav Ganapathy, Raymond Feng, Eric
Slud, Micah Goldblum, Tom Goldstein
- Abstract summary: We critically analyze the review process through a comprehensive study of papers submitted to ICLR between 2017 and 2020.
Our findings suggest strong institutional bias in accept/reject decisions, even after controlling for paper quality.
We find evidence for a gender gap, with female authors receiving lower scores, lower acceptance rates, and fewer citations per paper than their male counterparts.
- Score: 41.049292105761246
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mainstream machine learning conferences have seen a dramatic increase in the
number of participants, along with a growing range of perspectives, in recent
years. Members of the machine learning community are likely to overhear
allegations ranging from randomness of acceptance decisions to institutional
bias. In this work, we critically analyze the review process through a
comprehensive study of papers submitted to ICLR between 2017 and 2020. We
quantify reproducibility/randomness in review scores and acceptance decisions,
and examine whether scores correlate with paper impact. Our findings suggest
strong institutional bias in accept/reject decisions, even after controlling
for paper quality. Furthermore, we find evidence for a gender gap, with female
authors receiving lower scores, lower acceptance rates, and fewer citations per
paper than their male counterparts. We conclude our work with recommendations
for future conference organizers.
Related papers
- Has the Machine Learning Review Process Become More Arbitrary as the
Field Has Grown? The NeurIPS 2021 Consistency Experiment [86.77085171670323]
We present a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process.
We observe that the two committees disagree on their accept/reject recommendations for 23% of the papers and that, consistent with the results from 2014, approximately half of the list of accepted papers would change if the review process were randomly rerun.
arXiv Detail & Related papers (2023-06-05T21:26:12Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers.
To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed.
There are no standard procedure for using this ranking information and Area Chairs may use it in different ways.
We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z) - Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS
Experiment [26.30237757653724]
We revisit the 2014 NeurIPS experiment that examined inconsistency in conference peer review.
We find that for emphaccepted papers, there is no correlation between quality scores and impact of the paper.
arXiv Detail & Related papers (2021-09-20T18:06:22Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - A Large Scale Randomized Controlled Trial on Herding in Peer-Review
Discussions [33.261698377782075]
We aim to understand whether reviewers and more senior decision makers get disproportionately influenced by the first argument presented in a discussion.
Specifically, we design and execute a randomized controlled trial with the goal of testing for the conditional causal effect of the discussion initiator's opinion on the outcome of a paper.
arXiv Detail & Related papers (2020-11-30T18:23:07Z) - An Open Review of OpenReview: A Critical Analysis of the Machine
Learning Conference Review Process [41.049292105761246]
We critically analyze the review process through a comprehensive study of papers submitted to ICLR between 2017 and 2020.
Our findings suggest strong institutional bias in accept/reject decisions, even after controlling for paper quality.
We find evidence for a gender gap, with female authors receiving lower scores, lower acceptance rates, and fewer citations per paper than their male counterparts.
arXiv Detail & Related papers (2020-10-11T02:06:04Z) - Aspect-based Sentiment Analysis of Scientific Reviews [12.472629584751509]
We show that the distribution of aspect-based sentiments obtained from a review is significantly different for accepted and rejected papers.
As a second objective, we quantify the extent of disagreement among the reviewers refereeing a paper.
We also investigate the extent of disagreement between the reviewers and the chair and find that the inter-reviewer disagreement may have a link to the disagreement with the chair.
arXiv Detail & Related papers (2020-06-05T07:06:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.