Mechanisms for Data Sharing in Collaborative Causal Inference (Extended Version)
- URL: http://arxiv.org/abs/2407.11032v1
- Date: Thu, 4 Jul 2024 14:32:32 GMT
- Title: Mechanisms for Data Sharing in Collaborative Causal Inference (Extended Version)
- Authors: Björn Filter, Ralf Möller, Özgür Lütfü Özçep,
- Abstract summary: This paper devises an evaluation scheme to measure the value of each party's data contribution to the common learning task.
It can be leveraged to reward agents fairly, according to the quality of their data, or to maximize all agents' data contributions.
- Score: 2.709511652792003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Collaborative causal inference (CCI) is a federated learning method for pooling data from multiple, often self-interested, parties, to achieve a common learning goal over causal structures, e.g. estimation and optimization of treatment variables in a medical setting. Since obtaining data can be costly for the participants and sharing unique data poses the risk of losing competitive advantages, motivating the participation of all parties through equitable rewards and incentives is necessary. This paper devises an evaluation scheme to measure the value of each party's data contribution to the common learning task, tailored to causal inference's statistical demands, by comparing completed partially directed acyclic graphs (CPDAGs) inferred from observational data contributed by the participants. The Data Valuation Scheme thus obtained can then be used to introduce mechanisms that incentivize the agents to contribute data. It can be leveraged to reward agents fairly, according to the quality of their data, or to maximize all agents' data contributions.
Related papers
- Federated Prediction-Powered Inference from Decentralized Data [40.84399531998246]
Prediction-Powered Inference (PPI) has been proposed to ensure statistical validity despite the unreliability.
The Fed-PPI framework involves training local models on private data, aggregating them through Federated Learning (FL), and deriving confidence intervals using PPI.
arXiv Detail & Related papers (2024-09-03T09:14:18Z) - Incentives in Private Collaborative Machine Learning [56.84263918489519]
Collaborative machine learning involves training models on data from multiple parties.
We introduce differential privacy (DP) as an incentive.
We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-04-02T06:28:22Z) - Efficient Core-selecting Incentive Mechanism for Data Sharing in
Federated Learning [0.12289361708127873]
Federated learning is a distributed machine learning system that uses participants' data to train an improved global model.
How to establish an incentive mechanism that both incentivizes inputting data truthfully and promotes stable cooperation has become an important issue to consider.
We propose an efficient core-selecting mechanism based on sampling approximation that only aggregates models on sampled coalitions to approximate the exact result.
arXiv Detail & Related papers (2023-09-21T01:47:39Z) - Evaluating and Incentivizing Diverse Data Contributions in Collaborative
Learning [89.21177894013225]
For a federated learning model to perform well, it is crucial to have a diverse and representative dataset.
We show that the statistical criterion used to quantify the diversity of the data, as well as the choice of the federated learning algorithm used, has a significant effect on the resulting equilibrium.
We leverage this to design simple optimal federated learning mechanisms that encourage data collectors to contribute data representative of the global population.
arXiv Detail & Related papers (2023-06-08T23:38:25Z) - Mechanisms that Incentivize Data Sharing in Federated Learning [90.74337749137432]
We show how a naive scheme leads to catastrophic levels of free-riding where the benefits of data sharing are completely eroded.
We then introduce accuracy shaping based mechanisms to maximize the amount of data generated by each agent.
arXiv Detail & Related papers (2022-07-10T22:36:52Z) - Incentivizing Federated Learning [2.420324724613074]
This paper presents an incentive mechanism that encourages clients to contribute as much data as they can obtain.
Unlike previous incentive mechanisms, our approach does not monetize data.
We theoretically prove that clients will use as much data as they can possibly possess to participate in federated learning under certain conditions.
arXiv Detail & Related papers (2022-05-22T23:02:43Z) - Incentivizing Collaboration in Machine Learning via Synthetic Data
Rewards [26.850070556844628]
This paper presents a novel collaborative generative modeling (CGM) framework that incentivizes collaboration among self-interested parties to contribute data.
Distributing synthetic data as rewards offers task- and model-agnostic benefits for downstream learning tasks.
arXiv Detail & Related papers (2021-12-17T05:15:30Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Data Sharing Markets [95.13209326119153]
We study a setup where each agent can be both buyer and seller of data.
We consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money)
arXiv Detail & Related papers (2021-07-19T06:00:34Z) - A Principled Approach to Data Valuation for Federated Learning [73.19984041333599]
Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources.
The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion.
This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value.
arXiv Detail & Related papers (2020-09-14T04:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.