Related papers: Data Preparation for Fairness-Performance Trade-Offs: A Practitioner-Friendly Alternative?

Data Preparation for Fairness-Performance Trade-Offs: A Practitioner-Friendly Alternative?

URL: http://arxiv.org/abs/2412.15920v1
Date: Fri, 20 Dec 2024 14:12:39 GMT
Title: Data Preparation for Fairness-Performance Trade-Offs: A Practitioner-Friendly Alternative?
Authors: Gianmario Voria, Rebecca Di Matteo, Giammaria Giordano, Gemma Catolino, Fabio Palomba,
Abstract summary: Pre-processing techniques, which mitigate bias before training, are effective but may impact model performance and pose integration difficulties.<n>This report proposes an empirical evaluation of how optimally selected fairness-aware practices, applied in early ML lifecycle stages, can enhance both fairness and performance.<n>Using FATE, we will analyze the fairness-performance trade-off, comparing pipelines selected by FATE with results by pre-processing bias mitigation techniques.
Score: 11.172805305320592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As machine learning (ML) systems are increasingly adopted across industries, addressing fairness and bias has become essential. While many solutions focus on ethical challenges in ML, recent studies highlight that data itself is a major source of bias. Pre-processing techniques, which mitigate bias before training, are effective but may impact model performance and pose integration difficulties. In contrast, fairness-aware Data Preparation practices are both familiar to practitioners and easier to implement, providing a more accessible approach to reducing bias. Objective. This registered report proposes an empirical evaluation of how optimally selected fairness-aware practices, applied in early ML lifecycle stages, can enhance both fairness and performance, potentially outperforming standard pre-processing bias mitigation methods. Method. To this end, we will introduce FATE, an optimization technique for selecting 'Data Preparation' pipelines that optimize fairness and performance. Using FATE, we will analyze the fairness-performance trade-off, comparing pipelines selected by FATE with results by pre-processing bias mitigation techniques.

Related papers

Contextual Fairness-Aware Practices in ML: A Cost-Effective Empirical Evaluation [48.943054662940916]
We investigate fairness-aware practices from two perspectives: contextual and cost-effectiveness. Our findings provide insights into how context influences the effectiveness of fairness-aware practices. This research aims to guide SE practitioners in selecting practices that achieve fairness with minimal performance costs.
arXiv Detail & Related papers (2025-03-19T18:10:21Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency. UPFT removes the need for labeled data or exhaustive sampling. Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
Overcoming Fairness Trade-offs via Pre-processing: A Causal Perspective [0.0]
Training machine learning models for fair decisions face two key challenges. The emphfairness-accuracy trade-off results from enforcing fairness which weakens its predictive performance. The incompatibility of different fairness metrics poses another trade-off -- also known as the emphimpossibility theorem.
arXiv Detail & Related papers (2025-01-24T18:33:18Z)
ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs [65.9625653425636]
Large Language models (LLMs) exhibit harmful social biases. This work introduces a novel approach utilizing ChatGPT to generate synthetic training data.
arXiv Detail & Related papers (2024-02-19T01:28:48Z)
Learning Fair Ranking Policies via Differentiable Optimization of Ordered Weighted Averages [55.04219793298687]
This paper shows how efficiently-solvable fair ranking models can be integrated into the training loop of Learning to Rank. In particular, this paper is the first to show how to backpropagate through constrained optimizations of OWA objectives, enabling their use in integrated prediction and decision models.
arXiv Detail & Related papers (2024-02-07T20:53:53Z)
Towards Accelerated Model Training via Bayesian Data Selection [45.62338106716745]
We propose a more reasonable data selection principle by examining the data's impact on the model's generalization loss. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss. This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models.
arXiv Detail & Related papers (2023-08-21T07:58:15Z)
FITNESS: A Causal De-correlation Approach for Mitigating Bias in Machine Learning Software [6.4073906779537095]
Biased datasets can lead to unfair and potentially harmful outcomes. In this paper, we propose a bias mitigation approach via de-correlating the causal effects between sensitive features and the label. Our key idea is that by de-correlating such effects from a causality perspective, the model would avoid making predictions based on sensitive features.
arXiv Detail & Related papers (2023-05-23T06:24:43Z)
Fairness-Aware Data Valuation for Supervised Learning [4.874780144224057]
We propose Fairness-Aware Data vauatiOn (FADO) to incorporate fairness concerns into a series of ML-related tasks. We show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results -- up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline.
arXiv Detail & Related papers (2023-03-29T18:51:13Z)
Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy. We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples. Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z)
Preventing Discriminatory Decision-making in Evolving Data Streams [8.952662914331901]
Bias in machine learning has rightly received significant attention over the last decade. Most fair machine learning (fair-ML) work to address bias in decision-making systems has focused solely on the offline setting. Despite the wide prevalence of online systems in the real world, work on identifying and correcting bias in the online setting is severely lacking.
arXiv Detail & Related papers (2023-02-16T01:20:08Z)
Stochastic Methods for AUC Optimization subject to AUC-based Fairness Constraints [51.12047280149546]
A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints. We formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints. We demonstrate the effectiveness of our approach on real-world data under different fairness metrics.
arXiv Detail & Related papers (2022-12-23T22:29:08Z)
Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution. We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.