A Gold Standard Dataset for the Reviewer Assignment Problem
- URL: http://arxiv.org/abs/2303.16750v1
- Date: Thu, 23 Mar 2023 16:15:03 GMT
- Title: A Gold Standard Dataset for the Reviewer Assignment Problem
- Authors: Ivan Stelmakh, John Wieting, Graham Neubig, Nihar B. Shah
- Abstract summary: "Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
- Score: 117.59690218507565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many peer-review venues are either using or looking to use algorithms to
assign submissions to reviewers. The crux of such automated approaches is the
notion of the "similarity score"--a numerical estimate of the expertise of a
reviewer in reviewing a paper--and many algorithms have been proposed to
compute these scores. However, these algorithms have not been subjected to a
principled comparison, making it difficult for stakeholders to choose the
algorithm in an evidence-based manner. The key challenge in comparing existing
algorithms and developing better algorithms is the lack of the publicly
available gold-standard data that would be needed to perform reproducible
research. We address this challenge by collecting a novel dataset of similarity
scores that we release to the research community. Our dataset consists of 477
self-reported expertise scores provided by 58 researchers who evaluated their
expertise in reviewing papers they have read previously.
We use this data to compare several popular algorithms employed in computer
science conferences and come up with recommendations for stakeholders. Our main
findings are as follows. First, all algorithms make a non-trivial amount of
error. For the task of ordering two papers in terms of their relevance for a
reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard
cases, highlighting the vital need for more research on the
similarity-computation problem. Second, most existing algorithms are designed
to work with titles and abstracts of papers, and in this regime the Specter+MFR
algorithm performs best. Third, to improve performance, it may be important to
develop modern deep-learning based algorithms that can make use of the full
texts of papers: the classical TD-IDF algorithm enhanced with full texts of
papers is on par with the deep-learning based Specter+MFR that cannot make use
of this information.
Related papers
- Towards Comparable Active Learning [6.579888565581481]
We show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research.
This paper addresses these issues by providing an Active Learning framework for a fair comparison of algorithms across different tasks and domains, as well as a fast and perform oracleant algorithm for evaluation.
arXiv Detail & Related papers (2023-11-30T08:54:32Z) - Regularization-Based Methods for Ordinal Quantification [49.606912965922504]
We study the ordinal case, i.e., the case in which a total order is defined on the set of n>2 classes.
We propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments.
arXiv Detail & Related papers (2023-10-13T16:04:06Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Stochastic Differentially Private and Fair Learning [7.971065005161566]
We provide the first differentially private algorithm for fair learning that is guaranteed to converge.
Our framework is flexible enough to permit different fairness, including demographic parity and equalized odds.
Our algorithm can be applied to non-binary classification tasks with multiple (non-binary) sensitive attributes.
arXiv Detail & Related papers (2022-10-17T06:54:57Z) - The CLRS Algorithmic Reasoning Benchmark [28.789225199559834]
Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms.
We propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook.
Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms.
arXiv Detail & Related papers (2022-05-31T09:56:44Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z) - Large-scale empirical validation of Bayesian Network structure learning
algorithms with noisy data [9.04391541965756]
This paper investigates the performance of 15 structure learning algorithms.
Each algorithm is tested over multiple case studies, sample sizes, types of noise, and assessed with multiple evaluation criteria.
Results suggest traditional synthetic performance may overestimate real-world performance by anywhere between 10% and more than 50%.
arXiv Detail & Related papers (2020-05-18T18:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.