seeBias: A Comprehensive Tool for Assessing and Visualizing AI Fairness
- URL: http://arxiv.org/abs/2504.08418v1
- Date: Fri, 11 Apr 2025 10:23:10 GMT
- Title: seeBias: A Comprehensive Tool for Assessing and Visualizing AI Fairness
- Authors: Yilin Ning, Yian Ma, Mingxuan Liu, Xin Li, Nan Liu,
- Abstract summary: seeBias is an R package for comprehensive evaluation of model fairness and predictive performance.<n>We show how seeBias supports fairness evaluations, and uncovers disparities that conventional fairness metrics may overlook.
- Score: 14.36364087809195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fairness in artificial intelligence (AI) prediction models is increasingly emphasized to support responsible adoption in high-stakes domains such as health care and criminal justice. Guidelines and implementation frameworks highlight the importance of both predictive accuracy and equitable outcomes. However, current fairness toolkits often evaluate classification performance disparities in isolation, with limited attention to other critical aspects such as calibration. To address these gaps, we present seeBias, an R package for comprehensive evaluation of model fairness and predictive performance. seeBias offers an integrated evaluation across classification, calibration, and other performance domains, providing a more complete view of model behavior. It includes customizable visualizations to support transparent reporting and responsible AI implementation. Using public datasets from criminal justice and healthcare, we demonstrate how seeBias supports fairness evaluations, and uncovers disparities that conventional fairness metrics may overlook. The R package is available on GitHub, and a Python version is under development.
Related papers
- Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing [90.65399476233495]
We introduce RISEBench, the first benchmark for evaluating Reasoning-Informed viSual Editing (RISE)<n>RISEBench focuses on four key reasoning types: Temporal, Causal, Spatial, and Logical Reasoning.<n>We propose an evaluation framework that assesses Instruction Reasoning, Appearance Consistency, and Visual Plausibility with both human judges and an LMM-as-a-judge approach.
arXiv Detail & Related papers (2025-04-03T17:59:56Z) - Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations [63.52709761339949]
We first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods.<n>We design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results.<n>We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates.
arXiv Detail & Related papers (2024-07-19T14:53:18Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - ComplAI: Theory of A Unified Framework for Multi-factor Assessment of
Black-Box Supervised Machine Learning Models [6.279863832853343]
ComplAI is a unique framework to enable, observe, analyze and quantify explainability, robustness, performance, fairness, and model behavior.
It evaluates different supervised Machine Learning models not just from their ability to make correct predictions but from overall responsibility perspective.
arXiv Detail & Related papers (2022-12-30T08:48:19Z) - fairlib: A Unified Framework for Assessing and Improving Classification
Fairness [66.27822109651757]
fairlib is an open-source framework for assessing and improving classification fairness.
We implement 14 debiasing methods, including pre-processing, at-training-time, and post-processing approaches.
The built-in metrics cover the most commonly used fairness criterion and can be further generalized and customized for fairness evaluation.
arXiv Detail & Related papers (2022-05-04T03:50:23Z) - Enriching ImageNet with Human Similarity Judgments and Psychological
Embeddings [7.6146285961466]
We introduce a dataset that embodies the task-general capabilities of human perception and reasoning.
The Human Similarity Judgments extension to ImageNet (ImageNet-HSJ) is composed of human similarity judgments.
The new dataset supports a range of task and performance metrics, including the evaluation of unsupervised learning algorithms.
arXiv Detail & Related papers (2020-11-22T13:41:54Z) - Prune Responsibly [0.913755431537592]
Irrespective of the specific definition of fairness in a machine learning application, pruning the underlying model affects it.
We investigate and document the emergence and exacerbation of undesirable per-class performance imbalances, across tasks and architectures, for almost one million categories considered across over 100K image classification models that undergo a pruning process.
We demonstrate the need for transparent reporting, inclusive of bias, fairness, and inclusion metrics, in real-life engineering decision-making around neural network pruning.
arXiv Detail & Related papers (2020-09-10T04:43:11Z) - Fairness by Explicability and Adversarial SHAP Learning [0.0]
We propose a new definition of fairness that emphasises the role of an external auditor and model explicability.
We develop a framework for mitigating model bias using regularizations constructed from the SHAP values of an adversarial surrogate model.
We demonstrate our approaches using gradient and adaptive boosting on: a synthetic dataset, the UCI Adult (Census) dataset and a real-world credit scoring dataset.
arXiv Detail & Related papers (2020-03-11T14:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.