Related papers: Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging

Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging

URL: http://arxiv.org/abs/2601.20269v1
Date: Wed, 28 Jan 2026 05:36:19 GMT
Title: Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging
Authors: Jie Tang, Chuanlong Xie, Xianli Zeng, Lixing Zhu,
Abstract summary: Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities.<n>We propose a novel empirical likelihood-based (EL) framework that constructs robust statistical measures for model performance disparities.
Score: 18.71249153088185
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities across sensitive subpopulations, raising critical concerns regarding algorithmic bias. Fairness auditing addresses these risks through two primary functions: certification, which verifies adherence to fairness constraints; and flagging, which isolates specific demographic groups experiencing disparate treatment. However, existing auditing techniques are frequently limited by restrictive distributional assumptions or prohibitive computational overhead. We propose a novel empirical likelihood-based (EL) framework that constructs robust statistical measures for model performance disparities. Unlike traditional methods, our approach is non-parametric; the proposed disparity statistics follow asymptotically chi-square or mixed chi-square distributions, ensuring valid inference without assuming underlying data distributions. This framework uses a constrained optimization profile that admits stable numerical solutions, facilitating both large-scale certification and efficient subpopulation discovery. Empirically, the EL methods outperform bootstrap-based approaches, yielding coverage rates closer to nominal levels while reducing computational latency by several orders of magnitude. We demonstrate the practical utility of this framework on the COMPAS dataset, where it successfully flags intersectional biases, specifically identifying a significantly higher positive prediction rate for African-American males under 25 and a systemic under-prediction for Caucasian females relative to the population mean.

Related papers

Towards Anytime-Valid Statistical Watermarking [63.02116925616554]
We develop the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference.<n>Our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-19T18:32:26Z)
Safe Fairness Guarantees Without Demographics in Classification: Spectral Uncertainty Set Perspective [9.149827831925185]
SPECTRE is a minimax-fair method that adjusts the spectrum of a simple Fourier feature mapping and constrains the extent to which the worst-case distribution can deviate from the empirical distribution.<n>It provides the highest average values on fairness guarantees together with the smallest interquartile range in comparison to state-of-the-art approaches.
arXiv Detail & Related papers (2026-02-12T10:08:08Z)
Reliable and Reproducible Demographic Inference for Fairness in Face Analysis [63.46525489354455]
We propose a fully reproducible DAI pipeline that replaces conventional end-to-end training with a modular transfer learning approach.<n>We audit this pipeline across three dimensions: accuracy, fairness, and a newly introduced notion of robustness, defined via intra-identity consistency.<n>Our results show that the proposed method outperforms strong baselines, particularly on ethnicity, which is the more challenging attribute.
arXiv Detail & Related papers (2025-10-23T12:22:02Z)
Set to Be Fair: Demographic Parity Constraints for Set-Valued Classification [5.085064777896467]
We address the problem of set-valued classification under demographic parity and expected size constraints.<n>We propose two complementary strategies: an oracle-based method that minimizes classification risk while satisfying both constraints, and a computationally efficient proxy that prioritizes constraint satisfaction.
arXiv Detail & Related papers (2025-10-06T15:36:45Z)
Detecting Statistically Significant Fairness Violations in Recidivism Forecasting Algorithms [0.0]
This paper introduces statistical tests that can be used to identify statistically significant violations of fairness metrics.<n>We demonstrate this approach by testing recidivism forecasting algorithms trained on data from the National Institute of Justice.
arXiv Detail & Related papers (2025-09-18T17:15:23Z)
Balancing Tails when Comparing Distributions: Comprehensive Equity Index (CEI) with Application to Bias Evaluation in Operational Face Biometrics [45.84303673987677]
Comprehensive Equity Index (CEI) is a novel metric designed to detect demographic bias in face recognition systems.<n>Our experiments confirm CEI's superior ability to detect nuanced biases where previous methods fall short.<n>CEI provides a robust and sensitive tool for operational fairness assessment.
arXiv Detail & Related papers (2025-06-12T10:43:31Z)
Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
Parametric Fairness with Statistical Guarantees [0.46040036610482665]
We extend the concept of Demographic Parity to incorporate distributional properties in predictions, allowing expert knowledge to be used in the fair solution. We illustrate the use of this new metric through a practical example of wages, and develop a parametric method that efficiently addresses practical challenges.
arXiv Detail & Related papers (2023-10-31T14:52:39Z)
Conformal Prediction for Federated Uncertainty Quantification Under Label Shift [57.54977668978613]
Federated Learning (FL) is a machine learning framework where many clients collaboratively train models. We develop a new conformal prediction method based on quantile regression and take into account privacy constraints.
arXiv Detail & Related papers (2023-06-08T11:54:58Z)
Statistical Inference for Fairness Auditing [4.318555434063274]
We frame this task as "fairness auditing," in terms of multiple hypothesis testing. We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups. Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately.
arXiv Detail & Related papers (2023-05-05T17:54:22Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Fair Densities via Boosting the Sufficient Statistics of Exponential Families [72.34223801798422]
We introduce a boosting algorithm to pre-process data for fairness. Our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. Empirical results are present to display the quality of result on real-world data.
arXiv Detail & Related papers (2020-12-01T00:49:17Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.