Data Sharing with a Generative AI Competitor
- URL: http://arxiv.org/abs/2505.12386v1
- Date: Sun, 18 May 2025 12:22:37 GMT
- Title: Data Sharing with a Generative AI Competitor
- Authors: Boaz Taitler, Omer Madmon, Moshe Tennenholtz, Omer Ben-Porat,
- Abstract summary: We provide a model of data sharing between a content creation firm and a GenAI platform that can also acquire content from third-party experts.<n>The interaction is modeled as a Stackelberg game: the firm first decides how much of its proprietary dataset to share with GenAI, and GenAI subsequently determines how much additional data to acquire from external experts.<n>Our results shed light on the economic forces shaping data-sharing partnerships in the age of GenAI.
- Score: 14.181796250900907
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: As GenAI platforms grow, their dependence on content from competing providers, combined with access to alternative data sources, creates new challenges for data-sharing decisions. In this paper, we provide a model of data sharing between a content creation firm and a GenAI platform that can also acquire content from third-party experts. The interaction is modeled as a Stackelberg game: the firm first decides how much of its proprietary dataset to share with GenAI, and GenAI subsequently determines how much additional data to acquire from external experts. Their utilities depend on user traffic, monetary transfers, and the cost of acquiring additional data from external experts. We characterize the unique subgame perfect equilibrium of the game and uncover a surprising phenomenon: The firm may be willing to pay GenAI to share the firm's own data, leading to a costly data-sharing equilibrium. We further characterize the set of Pareto improving data prices, and show that such improvements occur only when the firm pays to share data. Finally, we study how the price can be set to optimize different design objectives, such as promoting firm data sharing, expert data acquisition, or a balance of both. Our results shed light on the economic forces shaping data-sharing partnerships in the age of GenAI, and provide guidance for platforms, regulators and policymakers seeking to design effective data exchange mechanisms.
Related papers
- What's the next frontier for Data-centric AI? Data Savvy Agents [71.76058707995398]
We argue that data-savvy capabilities should be a top priority in the design of agentic systems.<n>We propose four key capabilities to realize this vision: Proactive data acquisition, Sophisticated data processing, Interactive test data synthesis, and Continual adaptation.
arXiv Detail & Related papers (2025-11-02T17:09:29Z) - When Assurance Undermines Intelligence: The Efficiency Costs of Data Governance in AI-Enabled Labor Markets [5.3700224653806865]
We show that restricting data use significantly reduced GenAI efficiency, leading to lower matching rates, higher employee turnover, and heightened labor market frictions.<n>Our findings reveal the unintended efficiency costs of well-intentioned data governance and highlight that information assurance, while essential for trust, can undermine intelligence-driven efficiency when misaligned with AI system design.
arXiv Detail & Related papers (2025-11-02T05:35:37Z) - Incentivizing Time-Aware Fairness in Data Sharing [73.83854445472149]
In collaborative data sharing and machine learning, multiple parties aggregate their data resources to train a machine learning model with better performance.<n>Existing frameworks assume that all parties join the collaboration simultaneously, which does not hold in many real-world scenarios.<n>We propose a fair and time-aware data sharing framework, including novel time-aware incentives.
arXiv Detail & Related papers (2025-10-10T10:29:32Z) - A Survey on Data Markets [73.07800441775814]
Growing trend of trading data for greater welfare has led to the emergence of data markets.
A data market is any mechanism whereby the exchange of data products including datasets and data derivatives takes place.
It serves as a coordinating mechanism by which several functions, including the pricing and the distribution of data, interact.
arXiv Detail & Related papers (2024-11-09T15:09:24Z) - Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - An Economic Solution to Copyright Challenges of Generative AI [35.37023083413299]
Generative artificial intelligence systems are trained to generate new pieces of text, images, videos, and other media.
There is growing concern that such systems may infringe on the copyright interests of training data contributors.
We propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content.
arXiv Detail & Related papers (2024-04-22T08:10:38Z) - CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources [5.898893619901382]
We propose a framework for the collaborative and private generation of synthetic data from distributed data holders.
We replace the trusted aggregator with secure multi-party computation protocols and output privacy via differential privacy (DP)
We demonstrate the applicability and scalability of our approach for the state-of-the-art select-measure-generate algorithms MWEM+PGM and AIM.
arXiv Detail & Related papers (2024-02-13T17:26:32Z) - Personalized Federated Learning with Attention-based Client Selection [57.71009302168411]
We propose FedACS, a new PFL algorithm with an Attention-based Client Selection mechanism.
FedACS integrates an attention mechanism to enhance collaboration among clients with similar data distributions.
Experiments on CIFAR10 and FMNIST validate FedACS's superiority.
arXiv Detail & Related papers (2023-12-23T03:31:46Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - DECORAIT -- DECentralized Opt-in/out Registry for AI Training [20.683704089165406]
We present DECORAIT; a decentralized registry through which content creators may assert their right to opt in or out of AI training.
GenAI enables images to be synthesized using AI models trained on vast amounts of data scraped from public sources.
arXiv Detail & Related papers (2023-09-25T16:19:35Z) - Data-centric AI: Perspectives and Challenges [51.70828802140165]
Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability.
We bring together three general missions: training data development, inference data development, and data maintenance.
arXiv Detail & Related papers (2023-01-12T05:28:59Z) - Mechanisms that Incentivize Data Sharing in Federated Learning [90.74337749137432]
We show how a naive scheme leads to catastrophic levels of free-riding where the benefits of data sharing are completely eroded.
We then introduce accuracy shaping based mechanisms to maximize the amount of data generated by each agent.
arXiv Detail & Related papers (2022-07-10T22:36:52Z) - Data Sharing Markets [95.13209326119153]
We study a setup where each agent can be both buyer and seller of data.
We consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money)
arXiv Detail & Related papers (2021-07-19T06:00:34Z) - Modeling Stakeholder-centric Value Chain of Data to Understand Data
Exchange Ecosystem [0.12891210250935145]
We propose a model describing the stakeholder-centric value chain (SVC) of data by focusing on the relationships among stakeholders in data businesses.
The SVC model enables the analysis and understanding of the structural characteristics of the data exchange ecosystem.
arXiv Detail & Related papers (2020-05-22T05:04:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.