The Confidence Trap: Gender Bias and Predictive Certainty in LLMs
- URL: http://arxiv.org/abs/2601.07806v1
- Date: Mon, 12 Jan 2026 18:38:05 GMT
- Title: The Confidence Trap: Gender Bias and Predictive Certainty in LLMs
- Authors: Ahmed Sabir, Markus Kängsepp, Rajesh Sharma,
- Abstract summary: The research investigates probability confidence calibration in contexts involving gendered pronoun resolution.<n>The goal is to evaluate if calibration metrics based on predicted confidence scores effectively capture fairness-related disparities in Large Language Models.
- Score: 5.926203312586108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increased use of Large Language Models (LLMs) in sensitive domains leads to growing interest in how their confidence scores correspond to fairness and bias. This study examines the alignment between LLM-predicted confidence and human-annotated bias judgments. Focusing on gender bias, the research investigates probability confidence calibration in contexts involving gendered pronoun resolution. The goal is to evaluate if calibration metrics based on predicted confidence scores effectively capture fairness-related disparities in LLMs. The results show that, among the six state-of-the-art models, Gemma-2 demonstrates the worst calibration according to the gender bias benchmark. The primary contribution of this work is a fairness-aware evaluation of LLMs' confidence calibration, offering guidance for ethical deployment. In addition, we introduce a new calibration metric, Gender-ECE, designed to measure gender disparities in resolution tasks.
Related papers
- On Calibration of Large Language Models: From Response To Capability [66.59139960234326]
Large language models (LLMs) are widely deployed as general-purpose problem solvers.<n>We introduce capability calibration, which targets the model's expected accuracy on a query.<n>Our results demonstrate that capability-calibrated confidence improves pass@$k$ prediction and inference budget allocation.
arXiv Detail & Related papers (2026-02-14T01:07:45Z) - Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations [49.84786015324238]
Confidence estimation (CE) indicates how reliable the answers of large language models (LLMs) are, and can impact user trust and decision-making.<n>We present a comprehensive evaluation framework for CE that measures their confidence quality on three new aspects.<n>These include robustness of confidence against prompt perturbations, stability across semantic equivalent answers, and sensitivity to semantically different answers.
arXiv Detail & Related papers (2026-01-12T23:16:50Z) - Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation [116.86965910589775]
We show that even minimal perturbations, such as masking just 10% of objects or weakly blurring backgrounds, can dramatically alter bias scores.<n>This suggests that current bias evaluations reflect model responses to spurious features rather than gender bias.
arXiv Detail & Related papers (2025-09-09T11:14:11Z) - Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases [2.9803250365852443]
This paper examines how signaling the evaluative purpose of a task impacts measured gender bias in LLMs.<n>We find that prompts that more clearly align with (gender bias) evaluation framing elicit distinct gender output distributions.
arXiv Detail & Related papers (2025-09-04T16:32:18Z) - Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement [6.92803536773427]
Social biases in Natural Language Processing (NLP) and Information Retrieval (IR) systems are an ongoing challenge.<n>We aim to address this issue by leveraging Large Language Models (LLMs) to detect and measure gender bias in passage ranking.<n>We introduce a novel gender fairness metric, named Class-wise Weighted Exposure (CWEx), aiming to address existing limitations.
arXiv Detail & Related papers (2025-06-27T16:39:12Z) - MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs [66.14178164421794]
We introduce MetaFaith, a novel prompt-based calibration approach inspired by human metacognition.<n>We show that MetaFaith robustly improves faithful calibration across diverse models and task domains, enabling up to 61% improvement in faithfulness.
arXiv Detail & Related papers (2025-05-30T17:54:08Z) - Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs [7.197702136906138]
We propose an uncertainty-aware fairness metric, UCerF, to enable a fine-grained evaluation of model fairness.<n> observing data size, diversity, and clarity issues in current datasets, we introduce a new gender-occupation fairness evaluation dataset.<n>We establish a benchmark, using our metric and dataset, and apply it to evaluate the behavior of ten open-source AI systems.
arXiv Detail & Related papers (2025-05-29T20:45:18Z) - The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [91.86718720024825]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.<n>Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.<n>We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.<n>GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.