Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases
- URL: http://arxiv.org/abs/2502.05849v3
- Date: Mon, 29 Sep 2025 22:18:33 GMT
- Title: Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases
- Authors: Jen-tse Huang, Yuhang Yan, Linqi Liu, Yixin Wan, Wenxuan Wang, Kai-Wei Chang, Michael R. Lyu,
- Abstract summary: We argue that identifying the boundary between fact and fair is essential for meaningful fairness evaluation.<n>We introduce Fact-or-Fair, a benchmark with (i) objective queries aligned with descriptive, fact-based judgments, and (ii) subjective queries aligned with normative, fairness-based judgments.
- Score: 77.3489598315447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent failures such as Google Gemini generating people of color in Nazi-era uniforms illustrate how AI outputs can be factually plausible yet socially harmful. AI models are increasingly evaluated for "fairness," yet existing benchmarks often conflate two fundamentally different dimensions: factual correctness and normative fairness. A model may generate responses that are factually accurate but socially unfair, or conversely, appear fair while distorting factual reality. We argue that identifying the boundary between fact and fair is essential for meaningful fairness evaluation. We introduce Fact-or-Fair, a benchmark with (i) objective queries aligned with descriptive, fact-based judgments, and (ii) subjective queries aligned with normative, fairness-based judgments. Our queries are constructed from 19 statistics and are grounded in cognitive psychology, drawing on representativeness bias, attribution bias, and ingroup-outgroup bias to explain why models often misalign fact and fairness. Experiments across ten frontier models reveal different levels of fact-fair trade-offs. By reframing fairness evaluation, we provide both a new theoretical lens and a practical benchmark to advance the responsible model assessments. Our test suite is publicly available at https://github.com/uclanlp/Fact-or-Fair.
Related papers
- Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z) - The AI Fairness Myth: A Position Paper on Context-Aware Bias [0.0]
We argue that fairness sometimes requires deliberate, context-aware preferential treatment of historically marginalized groups.<n>Rather than viewing bias solely as a flaw to eliminate, we propose a framework that embraces corrective, intentional biases.
arXiv Detail & Related papers (2025-05-02T02:47:32Z) - Defining bias in AI-systems: Biased models are fair models [2.8360662552057327]
We argue that a precise conceptualization of bias is necessary to effectively address fairness concerns.<n>Rather than viewing bias as inherently negative or unfair, we highlight the importance of distinguishing between bias and discrimination.
arXiv Detail & Related papers (2025-02-25T10:28:16Z) - On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI [17.101569078791492]
We study 43 CLIP vision-language models to determine whether they learn human-like facial impression biases.
We show for the first time that the the degree to which a bias is shared across a society predicts the degree to which it is reflected in a CLIP model.
arXiv Detail & Related papers (2024-08-04T08:26:58Z) - "Patriarchy Hurts Men Too." Does Your Model Agree? A Discussion on Fairness Assumptions [3.706222947143855]
In the context of group fairness, this approach often obscures implicit assumptions about how bias is introduced into the data.
We are assuming that the biasing process is a monotonic function of the fair scores, dependent solely on the sensitive attribute.
Either the behavior of the biasing process is more complex than mere monotonicity, which means we need to identify and reject our implicit assumptions.
arXiv Detail & Related papers (2024-08-01T07:06:30Z) - Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models [10.73340009530019]
This study addresses two such biases within Large Language Models (LLMs): representative bias and affinity bias.
We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS)
Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men.
Our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to bias fingerprints'
arXiv Detail & Related papers (2024-05-23T13:35:34Z) - Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas.
Existing T2I model bias evaluation methods only focus on social biases.
We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z) - Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment.
We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections.
We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z) - Causal Context Connects Counterfactual Fairness to Robust Prediction and
Group Fairness [15.83823345486604]
We motivatefactual fairness by showing that there is not a fundamental trade-off between fairness and accuracy.
Counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.
arXiv Detail & Related papers (2023-10-30T16:07:57Z) - Consistent End-to-End Estimation for Counterfactual Fairness [56.9060492313073]
We propose a novel counterfactual fairness predictor for making predictions under counterfactual fairness.<n>We provide theoretical guarantees that our method is effective in ensuring the notion of counterfactual fairness.
arXiv Detail & Related papers (2023-10-26T17:58:39Z) - Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age.
A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data.
In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - Learning Fair Node Representations with Graph Counterfactual Fairness [56.32231787113689]
We propose graph counterfactual fairness, which considers the biases led by the above facts.
We generate counterfactuals corresponding to perturbations on each node's and their neighbors' sensitive attributes.
Our framework outperforms the state-of-the-art baselines in graph counterfactual fairness.
arXiv Detail & Related papers (2022-01-10T21:43:44Z) - UnQovering Stereotyping Biases via Underspecified Questions [68.81749777034409]
We present UNQOVER, a framework to probe and quantify biases through underspecified questions.
We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors.
We use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion.
arXiv Detail & Related papers (2020-10-06T01:49:52Z) - Grading video interviews with fairness considerations [1.7403133838762446]
We present a methodology to automatically derive social skills of candidates based on their video response to interview questions.
We develop two machine-learning models to predict social skills.
We analyze fairness by studying the errors of models by race and gender.
arXiv Detail & Related papers (2020-07-02T10:06:13Z) - Statistical Equity: A Fairness Classification Objective [6.174903055136084]
We propose a new fairness definition motivated by the principle of equity.
We formalize our definition of fairness, and motivate it with its appropriate contexts.
We perform multiple automatic and human evaluations to show the effectiveness of our definition.
arXiv Detail & Related papers (2020-05-14T23:19:38Z) - Convex Fairness Constrained Model Using Causal Effect Estimators [6.414055487487486]
We devise novel models, called FairCEEs, which remove discrimination while keeping explanatory bias.
We provide an efficient algorithm for solving FairCEEs in regression and binary classification tasks.
arXiv Detail & Related papers (2020-02-16T03:40:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.