Integrating Rankings into Quantized Scores in Peer Review
- URL: http://arxiv.org/abs/2204.03505v1
- Date: Tue, 5 Apr 2022 19:39:13 GMT
- Title: Integrating Rankings into Quantized Scores in Peer Review
- Authors: Yusha Liu, Yichong Xu, Nihar B. Shah and Aarti Singh
- Abstract summary: In peer review, reviewers are usually asked to provide scores for the papers.
To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed.
There are no standard procedure for using this ranking information and Area Chairs may use it in different ways.
We take a principled approach to integrate the ranking information into the scores.
- Score: 61.27794774537103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In peer review, reviewers are usually asked to provide scores for the papers.
The scores are then used by Area Chairs or Program Chairs in various ways in
the decision-making process. The scores are usually elicited in a quantized
form to accommodate the limited cognitive ability of humans to describe their
opinions in numerical values. It has been found that the quantized scores
suffer from a large number of ties, thereby leading to a significant loss of
information. To mitigate this issue, conferences have started to ask reviewers
to additionally provide a ranking of the papers they have reviewed. There are
however two key challenges. First, there is no standard procedure for using
this ranking information and Area Chairs may use it in different ways
(including simply ignoring them), thereby leading to arbitrariness in the
peer-review process. Second, there are no suitable interfaces for judicious use
of this data nor methods to incorporate it in existing workflows, thereby
leading to inefficiencies. We take a principled approach to integrate the
ranking information into the scores. The output of our method is an updated
score pertaining to each review that also incorporates the rankings. Our
approach addresses the two aforementioned challenges by: (i) ensuring that
rankings are incorporated into the updates scores in the same manner for all
papers, thereby mitigating arbitrariness, and (ii) allowing to seamlessly use
existing interfaces and workflows designed for scores. We empirically evaluate
our method on synthetic datasets as well as on peer reviews from the ICLR 2017
conference, and find that it reduces the error by approximately 30% as compared
to the best performing baseline on the ICLR 2017 data.
Related papers
- New Directions in Text Classification Research: Maximizing The Performance of Sentiment Classification from Limited Data [0.0]
A benchmark dataset is provided for training and testing data on the issue of Kaesang Pangarep's appointment as Chairman of PSI.
The official score used is the F1-score, which balances precision and recall among the three classes, positive, negative, and neutral.
Both scoring (baseline and optimized) use the SVM method, which is widely reported as the state-of-the-art in conventional machine learning methods.
arXiv Detail & Related papers (2024-07-08T05:42:29Z) - Learning to Rank when Grades Matter [11.981942948477236]
Graded labels are ubiquitous in real-world learning-to-rank applications.
Traditional learning-to-rank techniques ignore predicting actual grades.
We propose a multiobjective formulation to jointly optimize both ranking and grade predictions.
arXiv Detail & Related papers (2023-06-14T17:30:02Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - PiRank: Learning To Rank via Differentiable Sorting [85.28916333414145]
We propose PiRank, a new class of differentiable surrogates for ranking.
We show that PiRank exactly recovers the desired metrics in the limit of zero temperature.
arXiv Detail & Related papers (2020-12-12T05:07:36Z) - Debiasing Evaluations That are Biased by Evaluations [32.135315382120154]
We consider the problem of mitigating outcome-induced biases in ratings when some information about the outcome is available.
We propose a debiasing method by solving a regularized optimization problem under this ordering constraint.
We also provide a carefully designed cross-validation method that adaptively chooses the appropriate amount of regularization.
arXiv Detail & Related papers (2020-12-01T18:20:43Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Rank over Class: The Untapped Potential of Ranking in Natural Language
Processing [8.637110868126546]
We argue that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould.
We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences.
In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification.
arXiv Detail & Related papers (2020-09-10T22:18:57Z) - Taking the Counterfactual Online: Efficient and Unbiased Online
Evaluation for Ranking [74.46448041224247]
We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimize the policy for logging data.
LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display.
We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods.
arXiv Detail & Related papers (2020-07-24T18:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.