Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach
- URL: http://arxiv.org/abs/2211.06398v1
- Date: Mon, 7 Nov 2022 16:19:42 GMT
- Title: Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach
- Authors: Jiayao Zhang, Hongming Zhang, Zhun Deng, Dan Roth
- Abstract summary: We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
- Score: 77.61131357420201
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Double-blind peer review mechanism has become the skeleton of academic
research across multiple disciplines including computer science, yet several
studies have questioned the quality of peer reviews and raised concerns on
potential biases in the process. In this paper, we conduct a thorough and
rigorous study on fairness disparities in peer review with the help of large
language models (LMs). We collect, assemble, and maintain a comprehensive
relational database for the International Conference on Learning
Representations (ICLR) conference from 2017 to date by aggregating data from
OpenReview, Google Scholar, arXiv, and CSRanking, and extracting high-level
features using language models. We postulate and study fairness disparities on
multiple protective attributes of interest, including author gender, geography,
author, and institutional prestige. We observe that the level of disparity
differs and textual features are essential in reducing biases in the predictive
modeling. We distill several insights from our analysis on study the peer
review process with the help of large LMs. Our database also provides avenues
for studying new natural language processing (NLP) methods that facilitate the
understanding of the peer review mechanism. We study a concrete example towards
automatic machine review systems and provide baseline models for the review
generation and scoring tasks such that the database can be used as a benchmark.
Related papers
- Why do you cite? An investigation on citation intents and decision-making classification processes [1.7812428873698407]
This study emphasizes the importance of trustfully classifying citation intents.
We present a study utilizing advanced Ensemble Strategies for Citation Intent Classification (CIC)
One of our models sets as a new state-of-the-art (SOTA) with an 89.46% Macro-F1 score on the SciCite benchmark.
arXiv Detail & Related papers (2024-07-18T09:29:33Z) - ElicitationGPT: Text Elicitation Mechanisms via Language Models [12.945581341789431]
This paper develops mechanisms for scoring elicited text against ground truth text using domain-knowledge-free queries to a large language model.
An empirical evaluation is conducted on peer reviews from a peer-grading dataset and in comparison to manual instructor scores for the peer reviews.
arXiv Detail & Related papers (2024-06-13T17:49:10Z) - RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem.
We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt.
We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z) - Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions [62.0123588983514]
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields.
We reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers.
We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources.
arXiv Detail & Related papers (2024-06-09T08:24:17Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews.
The newly emerging AI-generated literature reviews are also appraised.
This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Benchmarking Foundation Models with Language-Model-as-an-Examiner [47.345760054595246]
We propose a novel benchmarking framework, Language-Model-as-an-Examiner.
The LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner.
arXiv Detail & Related papers (2023-06-07T06:29:58Z) - NLPeer: A Unified Resource for the Computational Study of Peer Review [58.71736531356398]
We introduce NLPeer -- the first ethically sourced multidomain corpus of more than 5k papers and 11k review reports from five different venues.
We augment previous peer review datasets to include parsed and structured paper representations, rich metadata and versioning information.
Our work paves the path towards systematic, multi-faceted, evidence-based study of peer review in NLP and beyond.
arXiv Detail & Related papers (2022-11-12T12:29:38Z) - Predicting the Reproducibility of Social and Behavioral Science Papers
Using Supervised Learning Models [21.69933721765681]
We propose a framework that extracts five types of features from scholarly work that can be used to support assessments of published research claims.
We analyze pairwise correlations between individual features and their importance for predicting a set of human-assessed ground truth labels.
arXiv Detail & Related papers (2021-04-08T00:45:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.