Related papers: Fairness Evaluation of Large Language Models in Academic Library Reference Services

Fairness Evaluation of Large Language Models in Academic Library Reference Services

URL: http://arxiv.org/abs/2507.04224v2
Date: Wed, 23 Jul 2025 15:08:40 GMT
Title: Fairness Evaluation of Large Language Models in Academic Library Reference Services
Authors: Haining Wang, Jason Clark, Yueru Yan, Star Bradley, Ruiyang Chen, Yiqiong Zhang, Hengyi Fu, Zuoyu Tian,
Abstract summary: We evaluate whether large language models (LLMs) differentiate responses across user identities by prompting six state-of-the-art LLMs to assist patrons differing in sex, race/ethnicity, and institutional role.<n>We found no evidence of differentiation by race or ethnicity, and only minor evidence of stereotypical bias against women in one model.<n>These findings suggest that current LLMs show a promising degree of readiness to support equitable and contextually appropriate communication in academic library reference services.
Score: 6.335631290002225
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As libraries explore large language models (LLMs) for use in virtual reference services, a key question arises: Can LLMs serve all users equitably, regardless of demographics or social status? While they offer great potential for scalable support, LLMs may also reproduce societal biases embedded in their training data, risking the integrity of libraries' commitment to equitable service. To address this concern, we evaluate whether LLMs differentiate responses across user identities by prompting six state-of-the-art LLMs to assist patrons differing in sex, race/ethnicity, and institutional role. We found no evidence of differentiation by race or ethnicity, and only minor evidence of stereotypical bias against women in one model. LLMs demonstrated nuanced accommodation of institutional roles through the use of linguistic choices related to formality, politeness, and domain-specific vocabularies, reflecting professional norms rather than discriminatory treatment. These findings suggest that current LLMs show a promising degree of readiness to support equitable and contextually appropriate communication in academic library reference services.

Related papers

Language Models Change Facts Based on the Way You Talk [38.44076602344941]
We find that large language models (LLMs) are extremely sensitive to markers of identity in user queries.<n>These biases mean that the use of off-the-shelf LLMs for these applications may cause harmful differences in medical care, foster wage gaps, and create different political factual realities for people of different identities.
arXiv Detail & Related papers (2025-07-17T13:21:17Z)
Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences? [5.542420010310746]
A critical, yet understudied, issue is the potential divergence between an LLM's stated preferences and its revealed preferences.<n>This work formally defines and proposes a method to measure this preference deviation.<n>Our study will be crucial for integrating LLMs into services, especially those that interact directly with humans.
arXiv Detail & Related papers (2025-05-31T23:38:48Z)
From Promising Capability to Pervasive Bias: Assessing Large Language Models for Emergency Department Triage [6.135648377533492]
Large Language Models (LLMs) have shown promise in clinical decision support, yet their application to triage remains underexplored.<n>We systematically investigate the capabilities of LLMs in emergency department triage through two key dimensions.<n>We assess multiple LLM-based approaches, ranging from continued pre-training to in-context learning, as well as machine learning approaches.
arXiv Detail & Related papers (2025-04-22T21:11:47Z)
Investigating and Mitigating Stereotype-aware Unfairness in LLM-based Recommendations [18.862841015556995]
Large Language Models (LLMs) have demonstrated unprecedented language understanding and reasoning capabilities.<n>Recent studies have revealed that LLMs are likely to inherit stereotypes that are embedded ubiquitously in word embeddings.<n>This study reveals a new variant of fairness between stereotype groups containing both users and items, to quantify discrimination against stereotypes in LLM-RS.
arXiv Detail & Related papers (2025-04-05T15:09:39Z)
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks.<n>We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs.<n>These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z)
Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity [27.10502683001428]
This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. Experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts.
arXiv Detail & Related papers (2024-07-24T09:48:48Z)
CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs) Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs. Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z)
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters [97.11173801187816]
Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content. This paper critically examines gender biases in LLM-generated reference letters.
arXiv Detail & Related papers (2023-10-13T16:12:57Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.