Related papers: Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations

Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations

URL: http://arxiv.org/abs/2506.00074v1
Date: Thu, 29 May 2025 20:11:11 GMT
Title: Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations
Authors: Daniele Barolo, Chiara Valentin, Fariba Karimi, Luis Galárraga, Gonzalo G. Méndez, Lisette Espín-Noboa,
Abstract summary: This paper evaluates the performance of six open-weight LLMs in recommending experts in physics across five tasks.<n>The evaluation examines consistency, factuality, and biases related to gender, ethnicity, academic popularity, and scholar similarity.
Score: 2.548716674644006
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper evaluates the performance of six open-weight LLMs (llama3-8b, llama3.1-8b, gemma2-9b, mixtral-8x7b, llama3-70b, llama3.1-70b) in recommending experts in physics across five tasks: top-k experts by field, influential scientists by discipline, epoch, seniority, and scholar counterparts. The evaluation examines consistency, factuality, and biases related to gender, ethnicity, academic popularity, and scholar similarity. Using ground-truth data from the American Physical Society and OpenAlex, we establish scholarly benchmarks by comparing model outputs to real-world academic records. Our analysis reveals inconsistencies and biases across all models. mixtral-8x7b produces the most stable outputs, while llama3.1-70b shows the highest variability. Many models exhibit duplication, and some, particularly gemma2-9b and llama3.1-8b, struggle with formatting errors. LLMs generally recommend real scientists, but accuracy drops in field-, epoch-, and seniority-specific queries, consistently favoring senior scholars. Representation biases persist, replicating gender imbalances (reflecting male predominance), under-representing Asian scientists, and over-representing White scholars. Despite some diversity in institutional and collaboration networks, models favor highly cited and productive scholars, reinforcing the rich-getricher effect while offering limited geographical representation. These findings highlight the need to improve LLMs for more reliable and equitable scholarly recommendations.

Related papers

Who Gets Cited? Gender- and Majority-Bias in LLM-Driven Reference Selection [0.16317061277456998]
This study systematically investigates gender bias in large language models (LLMs)<n>Our results reveal two forms of bias: a persistent preference for male-authored references and a majority-group bias that favors whichever gender is more prevalent in the candidate pool.<n>Our findings indicate that LLMs can reinforce or exacerbate existing gender imbalances in scholarly recognition.
arXiv Detail & Related papers (2025-08-02T13:27:32Z)
Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries [85.909363478929]
In this study, we focus on 19 real-world statistics collected from authoritative sources.<n>We develop a checklist comprising objective and subjective queries to analyze behavior of large language models.<n>We propose metrics to assess factuality and fairness, and formally prove the inherent trade-off between these two aspects.
arXiv Detail & Related papers (2025-02-09T10:54:11Z)
LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education [6.354025374447606]
We evaluate large language models (LLMs) for bias in the personalized educational setting.<n>We reveal significant biases in how models generate and select educational content tailored to different demographic groups.
arXiv Detail & Related papers (2024-10-17T20:27:44Z)
Gender Biases in LLMs: Higher intelligence in LLM does not necessarily solve gender bias and stereotyping [0.0]
Large Language Models (LLMs) are finding applications in all aspects of life, but their susceptibility to biases, particularly gender stereotyping, raises ethical concerns.<n>This study introduces a novel methodology, a persona-based framework, and a unisex name methodology to investigate whether higher-intelligence LLMs reduce such biases.
arXiv Detail & Related papers (2024-09-30T05:22:54Z)
Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases. Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z)
JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models [12.12628747941818]
This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring. We introduce a new construct grounded in labour economics, legal principles, and critiques of current bias benchmarks. We analyze gender hiring biases in ten state-of-the-art LLMs.
arXiv Detail & Related papers (2024-06-17T09:15:57Z)
Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes [7.718858707298602]
Large language models (LLMs) have been widely integrated into production pipelines, like recruitment and recommendation systems.<n>This paper investigates LLMs' behavior with respect to gender stereotypes, in the context of occupation decision making.
arXiv Detail & Related papers (2024-05-06T18:09:32Z)
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency in Large Language Model (LLM)-generated content.<n>We introduce the Language Agency Bias Evaluation benchmark, which comprehensively evaluates biases in LLMs.<n>Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral.
arXiv Detail & Related papers (2024-04-16T12:27:54Z)
What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond? We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations. We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z)
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs [67.51906565969227]
We study the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks. Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups.
arXiv Detail & Related papers (2023-11-08T18:52:17Z)
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters [97.11173801187816]
Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content. This paper critically examines gender biases in LLM-generated reference letters.
arXiv Detail & Related papers (2023-10-13T16:12:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.