From Fairness to Truthfulness: Rethinking Data Valuation Design
- URL: http://arxiv.org/abs/2504.05563v1
- Date: Mon, 07 Apr 2025 23:34:11 GMT
- Title: From Fairness to Truthfulness: Rethinking Data Valuation Design
- Authors: Dongyang Fan, Tyler J. Rotello, Sai Praneeth Karimireddy,
- Abstract summary: We revisit the design of data markets through a game-theoretic lens, where data owners face private, heterogeneous costs for data sharing.<n>We show that commonly used valuation methods fail to ensure truthful reporting of these costs, leading to inefficient market outcomes.<n>We adapt well-established payment rules from mechanism design, namely Myerson and Vickrey-Clarke-Groves to the data market setting.
- Score: 12.067958128148112
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models increasingly rely on external data sources, fairly compensating data contributors has become a central concern. In this paper, we revisit the design of data markets through a game-theoretic lens, where data owners face private, heterogeneous costs for data sharing. We show that commonly used valuation methods--such as Leave-One-Out and Data Shapley--fail to ensure truthful reporting of these costs, leading to inefficient market outcomes. To address this, we adapt well-established payment rules from mechanism design, namely Myerson and Vickrey-Clarke-Groves (VCG), to the data market setting. We demonstrate that the Myerson payment is the minimal truthful payment mechanism, optimal from the buyer's perspective, and that VCG and Myerson payments coincide in unconstrained allocation settings. Our findings highlight the importance of incorporating incentive compatibility into data valuation, paving the way for more robust and efficient data markets.
Related papers
- Data Pricing for Graph Neural Networks without Pre-purchased Inspection [15.556650640576311]
Model marketplaces leverage model trading mechanisms to properly incentive data owners to contribute their data.
We propose a novel mechanism, named Structural Importance based Model Trading (SIMT) mechanism, that assesses the data importance and compensates data owners accordingly.
SIMT consistently outperforms vanilla baselines by up to $40%$ in both MacroF1 and MicroF1.
arXiv Detail & Related papers (2025-02-12T10:42:04Z) - An Instrumental Value for Data Production and its Application to Data Pricing [107.98697414652479]
This paper develops an approach for capturing the instrumental value of data production processes.<n>We show how they connect to classic notions of information design and signals in information economics.
arXiv Detail & Related papers (2024-12-24T03:53:57Z) - Wasserstein Markets for Differentially-Private Data [1.4266656344673316]
Data markets provide a means to enable wider access as well as determine the appropriate privacy-utility trade-off.
Existing data market frameworks either require a trusted third party to perform expensive valuations or are unable to capture the nature of data value.
This paper proposes a valuation mechanism based on the Wasserstein distance for differentially-private data, and corresponding procurement mechanisms.
arXiv Detail & Related papers (2024-12-03T17:40:26Z) - Pricing Strategies for Different Accuracy Models from the Same Dataset Based on Generalized Hotelling's Law [9.353146025394372]
We consider a scenario where a seller possesses a dataset $D$ and trains it into models of varying accuracies for sale in the market.<n>The dataset can be reused to train models with different accuracies, and the training cost is independent of the sales volume.
arXiv Detail & Related papers (2024-04-08T08:02:18Z) - DAVED: Data Acquisition via Experimental Design for Data Markets [25.300193837833426]
We propose a federated approach to the data acquisition problem that is inspired by linear experimental design.
Our proposed data acquisition method achieves lower prediction error without requiring labeled validation data.
The key insight of our work is that a method that directly estimates the benefit of acquiring data for test set prediction is particularly compatible with a decentralized market setting.
arXiv Detail & Related papers (2024-03-20T18:05:52Z) - A Bargaining-based Approach for Feature Trading in Vertical Federated
Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions.
Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z) - Privacy-Aware Data Acquisition under Data Similarity in Regression Markets [29.64195175524365]
We show that data similarity and privacy preferences are integral to market design.
We numerically evaluate how data similarity affects market participation and traded data value.
arXiv Detail & Related papers (2023-12-05T09:39:04Z) - Mechanisms that Incentivize Data Sharing in Federated Learning [90.74337749137432]
We show how a naive scheme leads to catastrophic levels of free-riding where the benefits of data sharing are completely eroded.
We then introduce accuracy shaping based mechanisms to maximize the amount of data generated by each agent.
arXiv Detail & Related papers (2022-07-10T22:36:52Z) - VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations.
Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data.
Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z) - Data Sharing Markets [95.13209326119153]
We study a setup where each agent can be both buyer and seller of data.
We consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money)
arXiv Detail & Related papers (2021-07-19T06:00:34Z) - A Principled Approach to Data Valuation for Federated Learning [73.19984041333599]
Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources.
The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion.
This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value.
arXiv Detail & Related papers (2020-09-14T04:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.