Equal Access, Unequal Interaction: A Counterfactual Audit of LLM Fairness
- URL: http://arxiv.org/abs/2602.02932v1
- Date: Tue, 03 Feb 2026 00:05:38 GMT
- Title: Equal Access, Unequal Interaction: A Counterfactual Audit of LLM Fairness
- Authors: Alireza Amiri-Margavi, Arshia Gharagozlou, Amin Gholami Davodi, Seyed Pouyan Mousavi Davoudi, Hamidreza Hasani Balyani,
- Abstract summary: We examine how large language models differ in tone, uncertainty, and linguistic framing across demographic identities after access is granted.<n>We observe systematic, model-specific disparities in interaction quality.<n>These results show that fairness disparities can persist at the interaction level even when access is equal.
- Score: 0.8699280339422538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior work on fairness in large language models (LLMs) has primarily focused on access-level behaviors such as refusals and safety filtering. However, equitable access does not ensure equitable interaction quality once a response is provided. In this paper, we conduct a controlled fairness audit examining how LLMs differ in tone, uncertainty, and linguistic framing across demographic identities after access is granted. Using a counterfactual prompt design, we evaluate GPT-4 and LLaMA-3.1-70B on career advice tasks while varying identity attributes along age, gender, and nationality. We assess access fairness through refusal analysis and measure interaction quality using automated linguistic metrics, including sentiment, politeness, and hedging. Identity-conditioned differences are evaluated using paired statistical tests. Both models exhibit zero refusal rates across all identities, indicating uniform access. Nevertheless, we observe systematic, model-specific disparities in interaction quality: GPT-4 expresses significantly higher hedging toward younger male users, while LLaMA exhibits broader sentiment variation across identity groups. These results show that fairness disparities can persist at the interaction level even when access is equal, motivating evaluation beyond refusal-based audits.
Related papers
- Counterfactual Fairness Evaluation of LLM-Based Contact Center Agent Quality Assurance System [2.5609209153559513]
Large Language Models (LLMs) are increasingly deployed in contact-center Quality Assurance (QA) to automate agent performance evaluation and coaching feedback.<n>We present a counterfactual fairness evaluation of LLM-based QA systems across 13 dimensions spanning three categories: Identity, Context, and Behavioral Style.
arXiv Detail & Related papers (2026-02-16T17:56:18Z) - Partial Identification Approach to Counterfactual Fairness Assessment [50.88100567472179]
We introduce a Bayesian approach to bound unknown counterfactual fairness measures with high confidence.<n>Our results reveal a positive (spurious) effect on the COMPAS score when changing race to African-American (from all others) and a negative (direct causal) effect when transitioning from young to old age.
arXiv Detail & Related papers (2025-09-30T18:35:08Z) - Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution [5.061421107401101]
Large language models (LLMs) have achieved impressive performance, leading to their widespread adoption as decision-support tools in resource-constrained contexts like hiring and admissions.<n>There is, however, scientific consensus that AI systems can reflect and exacerbate societal biases, raising concerns about identity-based harm when used in critical social contexts.<n>In this work, we extend single-axis fairness evaluations to examine intersectional bias, recognizing that when multiple axes of discrimination intersect, they create distinct patterns of disadvantage.
arXiv Detail & Related papers (2025-08-09T22:24:40Z) - Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective [24.54292750583169]
Large Language Models (LLMs) often generate responses with inherent biases, undermining their reliability in real-world applications.<n>We propose FiSCo (Fine-grained Semantic Comparison), a novel statistical framework to evaluate group-level fairness in LLMs.<n>We decompose model outputs into semantically distinct claims and apply statistical hypothesis testing to compare inter- and intra-group similarities.
arXiv Detail & Related papers (2025-06-23T18:31:22Z) - Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach [53.824673312331626]
Implicit Demography Inference (IDI) module uses k-means clustering to mitigate bias in Speech Emotion Recognition (SER)<n>Experiments show that pseudo-labeling IDI reduces subgroup disparities, improving fairness metrics by over 28%.<n>Unsupervised IDI yields more than a 4.6% improvement in fairness metrics with a drop of less than 3.6% in SER performance.
arXiv Detail & Related papers (2025-05-20T14:50:44Z) - Interactional Fairness in LLM Multi-Agent Systems: An Evaluation Framework [0.0]
We introduce a novel framework for evaluating Interactional fairness encompassing Interpersonal fairness (IF) and Informational fairness (InfF) in multi-agent systems.<n>We validate our framework through a pilot study using controlled simulations of a resource negotiation task.<n>Results show that tone and justification quality significantly affect acceptance decisions even when objective outcomes are held constant.
arXiv Detail & Related papers (2025-05-17T13:24:13Z) - Refusal as Silence: Gendered Disparities in Vision-Language Model Responses [0.4199844472131921]
This study investigates refusal as a sociotechnical outcome through a counterfactual persona design.<n>We find that transgender and non-binary personas experience significantly higher refusal rates, even in non-harmful contexts.
arXiv Detail & Related papers (2024-06-12T13:52:30Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - MultiFair: Multi-Group Fairness in Machine Learning [52.24956510371455]
We study multi-group fairness in machine learning (MultiFair)
We propose a generic end-to-end algorithmic framework to solve it.
Our proposed framework is generalizable to many different settings.
arXiv Detail & Related papers (2021-05-24T02:30:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.