Incentivizing Time-Aware Fairness in Data Sharing
- URL: http://arxiv.org/abs/2510.09240v2
- Date: Wed, 22 Oct 2025 14:04:34 GMT
- Title: Incentivizing Time-Aware Fairness in Data Sharing
- Authors: Jiangwei Chen, Kieu Thao Nguyen Pham, Rachael Hwee Ling Sim, Arun Verma, Zhaoxuan Wu, Chuan-Sheng Foo, Bryan Kian Hsiang Low,
- Abstract summary: In collaborative data sharing and machine learning, multiple parties aggregate their data resources to train a machine learning model with better performance.<n>Existing frameworks assume that all parties join the collaboration simultaneously, which does not hold in many real-world scenarios.<n>We propose a fair and time-aware data sharing framework, including novel time-aware incentives.
- Score: 73.83854445472149
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In collaborative data sharing and machine learning, multiple parties aggregate their data resources to train a machine learning model with better model performance. However, as the parties incur data collection costs, they are only willing to do so when guaranteed incentives, such as fairness and individual rationality. Existing frameworks assume that all parties join the collaboration simultaneously, which does not hold in many real-world scenarios. Due to the long processing time for data cleaning, difficulty in overcoming legal barriers, or unawareness, the parties may join the collaboration at different times. In this work, we propose the following perspective: As a party who joins earlier incurs higher risk and encourages the contribution from other wait-and-see parties, that party should receive a reward of higher value for sharing data earlier. To this end, we propose a fair and time-aware data sharing framework, including novel time-aware incentives. We develop new methods for deciding reward values to satisfy these incentives. We further illustrate how to generate model rewards that realize the reward values and empirically demonstrate the properties of our methods on synthetic and real-world datasets.
Related papers
- Sharing Knowledge without Sharing Data: Stitches can improve ensembles of disjointly trained models [0.9851812512860351]
In some settings, like in the medical domain, data is often fragmented across parties, and cannot be readily shared.<n>We investigate how asynchronous collaboration affects performance, and propose to use stitching as a method for combining models.<n>We find that combining intermediate representations in individually trained models with a well placed pair of stitching layers allows this performance to recover to a competitive degree.
arXiv Detail & Related papers (2025-12-19T13:59:46Z) - Mechanisms for Data Sharing in Collaborative Causal Inference (Extended Version) [2.709511652792003]
This paper devises an evaluation scheme to measure the value of each party's data contribution to the common learning task.
It can be leveraged to reward agents fairly, according to the quality of their data, or to maximize all agents' data contributions.
arXiv Detail & Related papers (2024-07-04T14:32:32Z) - Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.<n>Existing approaches require re-training models on different data subsets, which is computationally intensive.<n>This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - Incentives in Private Collaborative Machine Learning [56.84263918489519]
Collaborative machine learning involves training models on data from multiple parties.
We introduce differential privacy (DP) as an incentive.
We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-04-02T06:28:22Z) - A Bargaining-based Approach for Feature Trading in Vertical Federated
Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions.
Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z) - Mechanisms that Incentivize Data Sharing in Federated Learning [90.74337749137432]
We show how a naive scheme leads to catastrophic levels of free-riding where the benefits of data sharing are completely eroded.
We then introduce accuracy shaping based mechanisms to maximize the amount of data generated by each agent.
arXiv Detail & Related papers (2022-07-10T22:36:52Z) - Incentivizing Federated Learning [2.420324724613074]
This paper presents an incentive mechanism that encourages clients to contribute as much data as they can obtain.
Unlike previous incentive mechanisms, our approach does not monetize data.
We theoretically prove that clients will use as much data as they can possibly possess to participate in federated learning under certain conditions.
arXiv Detail & Related papers (2022-05-22T23:02:43Z) - Incentivizing Collaboration in Machine Learning via Synthetic Data
Rewards [26.850070556844628]
This paper presents a novel collaborative generative modeling (CGM) framework that incentivizes collaboration among self-interested parties to contribute data.
Distributing synthetic data as rewards offers task- and model-agnostic benefits for downstream learning tasks.
arXiv Detail & Related papers (2021-12-17T05:15:30Z) - Data Sharing Markets [95.13209326119153]
We study a setup where each agent can be both buyer and seller of data.
We consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money)
arXiv Detail & Related papers (2021-07-19T06:00:34Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - Collaborative Machine Learning with Incentive-Aware Model Rewards [32.43927226170119]
Collaborative machine learning (ML) is an appealing paradigm to build high-quality ML models by training on the aggregated data from many parties.
These parties are only willing to share their data when given enough incentives, such as a guaranteed fair reward based on their contributions.
This paper proposes to value a party's reward based on Shapley value and information gain on model parameters given its data.
arXiv Detail & Related papers (2020-10-24T06:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.