Correct-By-Construction: Certified Individual Fairness through Neural Network Training
- URL: http://arxiv.org/abs/2508.15642v1
- Date: Thu, 21 Aug 2025 15:14:14 GMT
- Title: Correct-By-Construction: Certified Individual Fairness through Neural Network Training
- Authors: Ruihan Zhang, Jun Sun,
- Abstract summary: We propose a novel framework that formally guarantees individual fairness throughout training.<n>A key element of our method is the use of randomised response mechanisms.<n>We formally prove that this mechanism sustains individual fairness throughout the training process.
- Score: 3.350980549219263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fairness in machine learning is more important than ever as ethical concerns continue to grow. Individual fairness demands that individuals differing only in sensitive attributes receive the same outcomes. However, commonly used machine learning algorithms often fail to achieve such fairness. To improve individual fairness, various training methods have been developed, such as incorporating fairness constraints as optimisation objectives. While these methods have demonstrated empirical effectiveness, they lack formal guarantees of fairness. Existing approaches that aim to provide fairness guarantees primarily rely on verification techniques, which can sometimes fail to produce definitive results. Moreover, verification alone does not actively enhance individual fairness during training. To address this limitation, we propose a novel framework that formally guarantees individual fairness throughout training. Our approach consists of two parts, i.e., (1) provably fair initialisation that ensures the model starts in a fair state, and (2) a fairness-preserving training algorithm that maintains fairness as the model learns. A key element of our method is the use of randomised response mechanisms, which protect sensitive attributes while maintaining fairness guarantees. We formally prove that this mechanism sustains individual fairness throughout the training process. Experimental evaluations confirm that our approach is effective, i.e., producing models that are empirically fair and accurate. Furthermore, our approach is much more efficient than the alternative approach based on certified training (which requires neural network verification during training).
Related papers
- Adversarial Bias: Data Poisoning Attacks on Fairness [48.17618627431355]
There is relatively little research on how an AI system's fairness can be intentionally compromised.<n>In this work, we provide a theoretical analysis demonstrating that a simple adversarial poisoning strategy is sufficient to induce maximally unfair behavior.<n>Our attack significantly outperforms existing methods in degrading fairness metrics across multiple models and datasets.
arXiv Detail & Related papers (2025-11-11T15:09:53Z) - Towards Fairness-Aware Adversarial Learning [13.932705960012846]
We propose a novel learning paradigm, named Fairness-Aware Adversarial Learning (FAAL)
Our method aims to find the worst distribution among different categories, and the solution is guaranteed to obtain the upper bound performance with high probability.
In particular, FAAL can fine-tune an unfair robust model to be fair within only two epochs, without compromising the overall clean and robust accuracies.
arXiv Detail & Related papers (2024-02-27T18:01:59Z) - Consistent End-to-End Estimation for Counterfactual Fairness [56.9060492313073]
We propose a novel counterfactual fairness predictor for making predictions under counterfactual fairness.<n>We provide theoretical guarantees that our method is effective in ensuring the notion of counterfactual fairness.
arXiv Detail & Related papers (2023-10-26T17:58:39Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - Fairness in Matching under Uncertainty [78.39459690570531]
algorithmic two-sided marketplaces have drawn attention to the issue of fairness in such settings.
We axiomatize a notion of individual fairness in the two-sided marketplace setting which respects the uncertainty in the merits.
We design a linear programming framework to find fair utility-maximizing distributions over allocations.
arXiv Detail & Related papers (2023-02-08T00:30:32Z) - Provable Fairness for Neural Network Models using Formal Verification [10.90121002896312]
We propose techniques to emphprove fairness using recently developed formal methods that verify properties of neural network models.
We show that through proper training, we can reduce unfairness by an average of 65.4% at a cost of less than 1% in AUC score.
arXiv Detail & Related papers (2022-12-16T16:54:37Z) - Practical Approaches for Fair Learning with Multitype and Multivariate
Sensitive Attributes [70.6326967720747]
It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences.
We introduce FairCOCCO, a fairness measure built on cross-covariance operators on reproducing kernel Hilbert Spaces.
We empirically demonstrate consistent improvements against state-of-the-art techniques in balancing predictive power and fairness on real-world datasets.
arXiv Detail & Related papers (2022-11-11T11:28:46Z) - Improving Robust Fairness via Balance Adversarial Training [51.67643171193376]
Adversarial training (AT) methods are effective against adversarial attacks, yet they introduce severe disparity of accuracy and robustness between different classes.
We propose Adversarial Training (BAT) to address the robust fairness problem.
arXiv Detail & Related papers (2022-09-15T14:44:48Z) - Adaptive Fairness Improvement Based on Causality Analysis [5.827653543633839]
Given a discriminating neural network, the problem of fairness improvement is to systematically reduce discrimination without significantly scarifies its performance.
We propose an approach which adaptively chooses the fairness improving method based on causality analysis.
Our approach is effective (i.e., always identify the best fairness improving method) and efficient (i.e., with an average time overhead of 5 minutes)
arXiv Detail & Related papers (2022-09-15T10:05:31Z) - FETA: Fairness Enforced Verifying, Training, and Predicting Algorithms
for Neural Networks [9.967054059014691]
We study the problem of verifying, training, and guaranteeing individual fairness of neural network models.
A popular approach for enforcing fairness is to translate a fairness notion into constraints over the parameters of the model.
We develop a counterexample-guided post-processing technique to provably enforce fairness constraints at prediction time.
arXiv Detail & Related papers (2022-06-01T15:06:11Z) - Optimising Equal Opportunity Fairness in Model Training [60.0947291284978]
Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias.
We propose two novel training objectives which directly optimise for the widely-used criterion of it equal opportunity, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.
arXiv Detail & Related papers (2022-05-05T01:57:58Z) - Towards Equal Opportunity Fairness through Adversarial Learning [64.45845091719002]
Adversarial training is a common approach for bias mitigation in natural language processing.
We propose an augmented discriminator for adversarial training, which takes the target class as input to create richer features.
arXiv Detail & Related papers (2022-03-12T02:22:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.