Related papers: Data Acquisition for Improving Model Fairness using Reinforcement Learning

Data Acquisition for Improving Model Fairness using Reinforcement Learning

URL: http://arxiv.org/abs/2412.03009v1
Date: Wed, 04 Dec 2024 03:56:54 GMT
Title: Data Acquisition for Improving Model Fairness using Reinforcement Learning
Authors: Jahid Hasan, Romila Pradhan,
Abstract summary: We focus on the task of acquiring additional labeled data points for training the downstream machine learning model to rapidly improve its fairness.<n>We present DataSift, a data acquisition framework based on the idea of data valuation that relies on partitioning and multi-armed bandits to determine the most valuable data points to acquire.<n>We empirically evaluate DataSift on several real-world and synthetic datasets and show that the fairness of a machine learning model can be significantly improved even while acquiring a few data points.
Score: 3.3916160303055563
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning systems are increasingly being used in critical decision making such as healthcare, finance, and criminal justice. Concerns around their fairness have resulted in several bias mitigation techniques that emphasize the need for high-quality data to ensure fairer decisions. However, the role of earlier stages of machine learning pipelines in mitigating model bias has not been explored well. In this paper, we focus on the task of acquiring additional labeled data points for training the downstream machine learning model to rapidly improve its fairness. Since not all data points in a data pool are equally beneficial to the task of fairness, we generate an ordering in which data points should be acquired. We present DataSift, a data acquisition framework based on the idea of data valuation that relies on partitioning and multi-armed bandits to determine the most valuable data points to acquire. Over several iterations, DataSift selects a partition and randomly samples a batch of data points from the selected partition, evaluates the benefit of acquiring the batch on model fairness, and updates the utility of partitions depending on the benefit. To further improve the effectiveness and efficiency of evaluating batches, we leverage influence functions that estimate the effect of acquiring a batch without retraining the model. We empirically evaluate DataSift on several real-world and synthetic datasets and show that the fairness of a machine learning model can be significantly improved even while acquiring a few data points.

Related papers

DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
Neural Dynamic Data Valuation [4.286118155737111]
We propose a novel data valuation method from the perspective of optimal control, named the neural dynamic data valuation (NDDV) Our method has solid theoretical interpretations to accurately identify the data valuation via the sensitivity of the data optimal control state. In addition, we implement a data re-weighting strategy to capture the unique features of data points, ensuring fairness through the interaction between data points and the mean-field states.
arXiv Detail & Related papers (2024-04-30T13:39:26Z)
A Bargaining-based Approach for Feature Trading in Vertical Federated Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions. Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z)
Data Valuation and Detections in Federated Learning [4.899818550820576]
Federated Learning (FL) enables collaborative model training while preserving the privacy of raw data. A challenge in this framework is the fair and efficient valuation of data, which is crucial for incentivizing clients to contribute high-quality data in the FL task. This paper introduces a novel privacy-preserving method for evaluating client contributions and selecting relevant datasets without a pre-specified training algorithm in an FL task.
arXiv Detail & Related papers (2023-11-09T12:01:32Z)
Fairness-Aware Data Valuation for Supervised Learning [4.874780144224057]
We propose Fairness-Aware Data vauatiOn (FADO) to incorporate fairness concerns into a series of ML-related tasks. We show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results -- up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline.
arXiv Detail & Related papers (2023-03-29T18:51:13Z)
Striving for data-model efficiency: Identifying data externalities on group performance [75.17591306911015]
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance. We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population. Our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
arXiv Detail & Related papers (2022-11-11T16:48:27Z)
Improving Fairness for Data Valuation in Federated Learning [39.61504568047234]
We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization.
arXiv Detail & Related papers (2021-09-19T02:39:59Z)
Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis [17.811597734603144]
We propose an approach to automatically generating counterfactual data for data augmentation and explanation. A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance.
arXiv Detail & Related papers (2021-06-29T10:27:01Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)
A Principled Approach to Data Valuation for Federated Learning [73.19984041333599]
Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources. The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion. This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value.
arXiv Detail & Related papers (2020-09-14T04:37:54Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models [10.501265073049447]
We propose Slice Tuner to selectively acquire data to ensure model accuracy and fairness. At its core, Slice Tuner maintains learning curves of slices that estimate the model accuracies given more data. We show that Slice Tuner significantly outperforms baselines in terms of model accuracy and fairness.
arXiv Detail & Related papers (2020-03-10T06:34:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.