The Effect of Enforcing Fairness on Reshaping Explanations in Machine Learning Models
- URL: http://arxiv.org/abs/2512.02265v1
- Date: Mon, 01 Dec 2025 23:16:05 GMT
- Title: The Effect of Enforcing Fairness on Reshaping Explanations in Machine Learning Models
- Authors: Joshua Wolff Anderson, Shyam Visweswaran,
- Abstract summary: Trustworthy machine learning in healthcare requires strong predictive performance, fairness, and explanations.<n>Clinicians may hesitate to rely on a model whose explanations shift after fairness constraints are applied.<n>This study examines how enhancing fairness through bias mitigation techniques reshapes Shapley-based feature rankings.
- Score: 4.121376921198563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Trustworthy machine learning in healthcare requires strong predictive performance, fairness, and explanations. While it is known that improving fairness can affect predictive performance, little is known about how fairness improvements influence explainability, an essential ingredient for clinical trust. Clinicians may hesitate to rely on a model whose explanations shift after fairness constraints are applied. In this study, we examine how enhancing fairness through bias mitigation techniques reshapes Shapley-based feature rankings. We quantify changes in feature importance rankings after applying fairness constraints across three datasets: pediatric urinary tract infection risk, direct anticoagulant bleeding risk, and recidivism risk. We also evaluate multiple model classes on the stability of Shapley-based rankings. We find that increasing model fairness across racial subgroups can significantly alter feature importance rankings, sometimes in different ways across groups. These results highlight the need to jointly consider accuracy, fairness, and explainability in model assessment rather than in isolation.
Related papers
- Fair Feature Importance Scores via Feature Occlusion and Permutation [41.73851747821022]
We propose two model-agnostic approaches to measure fair feature importance.<n>First, we compare model fairness before and after permuting feature values.<n>Second, we evaluate the fairness of models trained with and without a given feature.
arXiv Detail & Related papers (2026-02-09T21:02:52Z) - Equitable Survival Prediction: A Fairness-Aware Survival Modeling (FASM) Approach [13.80791791871853]
We propose a Fairness-Aware Survival Modeling (FASM) to mitigate algorithmic bias regarding both intra-group and cross-group risk rankings.<n>Time-stratified evaluations show that FASM maintains stable fairness over a 10-year horizon, with the greatest improvements observed during the mid-term of follow-up.
arXiv Detail & Related papers (2025-10-23T15:03:27Z) - An ExplainableFair Framework for Prediction of Substance Use Disorder Treatment Completion [2.863968392011842]
The objective of this study was to develop and implement a framework for addressing fairness and explainability.
We propose an explainable fairness framework, first developing a model with optimized performance, and then using an in-processing approach to mitigate model biases.
Our resulting-fairness enhanced models retain high sensitivity with improved fairness and explanations of the fairness-enhancement that may provide helpful insights for healthcare providers to guide clinical decision-making and resource allocation.
arXiv Detail & Related papers (2024-04-04T23:30:01Z) - Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age.
A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data.
In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - FairAdaBN: Mitigating unfairness with adaptive batch normalization and
its application to dermatological disease classification [14.589159162086926]
We propose FairAdaBN, which makes batch normalization adaptive to sensitive attribute.
We propose a new metric, named Fairness-Accuracy Trade-off Efficiency (FATE), to compute normalized fairness improvement over accuracy drop.
Experiments on two dermatological datasets show that our proposed method outperforms other methods on fairness criteria and FATE.
arXiv Detail & Related papers (2023-03-15T02:22:07Z) - Fairness Increases Adversarial Vulnerability [50.90773979394264]
This paper shows the existence of a dichotomy between fairness and robustness, and analyzes when achieving fairness decreases the model robustness to adversarial samples.
Experiments on non-linear models and different architectures validate the theoretical findings in multiple vision domains.
The paper proposes a simple, yet effective, solution to construct models achieving good tradeoffs between fairness and robustness.
arXiv Detail & Related papers (2022-11-21T19:55:35Z) - FairIF: Boosting Fairness in Deep Learning via Influence Functions with
Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF.
It minimizes the loss over the reweighted data set where the sample weights are computed.
We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z) - On the Robustness of Pretraining and Self-Supervision for a Deep
Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading.
We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions.
Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.