Data Sharing Markets
- URL: http://arxiv.org/abs/2107.08630v2
- Date: Tue, 20 Jul 2021 06:31:23 GMT
- Title: Data Sharing Markets
- Authors: Mohammad Rasouli, Michael I. Jordan
- Abstract summary: We study a setup where each agent can be both buyer and seller of data.
We consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money)
- Score: 95.13209326119153
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: With the growing use of distributed machine learning techniques, there is a
growing need for data markets that allows agents to share data with each other.
Nevertheless data has unique features that separates it from other commodities
including replicability, cost of sharing, and ability to distort. We study a
setup where each agent can be both buyer and seller of data. For this setup, we
consider two cases: bilateral data exchange (trading data with data) and
unilateral data exchange (trading data with money). We model bilateral sharing
as a network formation game and show the existence of strongly stable outcome
under the top agents property by allowing limited complementarity. We propose
ordered match algorithm which can find the stable outcome in O(N^2) (N is the
number of agents). For the unilateral sharing, under the assumption of additive
cost structure, we construct competitive prices that can implement any social
welfare maximizing outcome. Finally for this setup when agents have private
information, we propose mixed-VCG mechanism which uses zero cost data
distortion of data sharing with its isolated impact to achieve budget balance
while truthfully implementing socially optimal outcomes to the exact level of
budget imbalance of standard VCG mechanisms. Mixed-VCG uses data distortions as
data money for this purpose. We further relax zero cost data distortion
assumption by proposing distorted-mixed-VCG. We also extend our model and
results to data sharing via incremental inquiries and differential privacy
costs.
Related papers
- Incentives in Private Collaborative Machine Learning [56.84263918489519]
Collaborative machine learning involves training models on data from multiple parties.
We introduce differential privacy (DP) as an incentive.
We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-04-02T06:28:22Z) - CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources [5.898893619901382]
We propose a framework for the collaborative and private generation of synthetic data from distributed data holders.
We replace the trusted aggregator with secure multi-party computation protocols and output privacy via differential privacy (DP)
We demonstrate the applicability and scalability of our approach for the state-of-the-art select-measure-generate algorithms MWEM+PGM and AIM.
arXiv Detail & Related papers (2024-02-13T17:26:32Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Mechanisms that Incentivize Data Sharing in Federated Learning [90.74337749137432]
We show how a naive scheme leads to catastrophic levels of free-riding where the benefits of data sharing are completely eroded.
We then introduce accuracy shaping based mechanisms to maximize the amount of data generated by each agent.
arXiv Detail & Related papers (2022-07-10T22:36:52Z) - Improving Correlation Capture in Generating Imbalanced Data using
Differentially Private Conditional GANs [2.2265840715792735]
We propose DP-CGANS, a differentially private conditional GAN framework consisting of data transformation, sampling, conditioning, and networks training to generate realistic and privacy-preserving data.
We extensively evaluate our model with state-of-the-art generative models on three public datasets and two real-world personal health datasets in terms of statistical similarity, machine learning performance, and privacy measurement.
arXiv Detail & Related papers (2022-06-28T06:47:27Z) - Strategic Coalition for Data Pricing in IoT Data Markets [32.38170282930876]
This paper considers a market for trading Internet of Things (IoT) data that is used to train machine learning models.
The data is supplied to the market platform through a network and the price of such data is controlled based on the value it brings to the machine learning model.
arXiv Detail & Related papers (2022-06-15T19:48:10Z) - VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations.
Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data.
Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z) - Spending Privacy Budget Fairly and Wisely [7.975975942400017]
Differentially private (DP) synthetic data generation is a practical method for improving access to data.
One issue inherent to DP is that the "privacy budget" is generally "spent" evenly across features in the data set.
We develop ensemble methods that distribute the privacy budget "wisely" to maximize predictive accuracy of models trained on DP data.
arXiv Detail & Related papers (2022-04-27T13:13:56Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.