Fairshare Data Pricing for Large Language Models
- URL: http://arxiv.org/abs/2502.00198v1
- Date: Fri, 31 Jan 2025 22:27:34 GMT
- Title: Fairshare Data Pricing for Large Language Models
- Authors: Luyang Zhang, Cathy Jiao, Beibei Li, Chenyan Xiong,
- Abstract summary: We propose a fairshare pricing framework that sets training data prices using data valuation methods to quantify their contribution to large language models (LLMs)
We theoretically show that pricing derived from our framework is tightly linked to data valuation and buyers' budget, optimal for both buyers and sellers.
Our framework lays the foundation for future research on equitable and sustainable data markets for large-scale AI.
- Score: 15.79368596445939
- License:
- Abstract: Training data is a pivotal resource for building large language models (LLMs), but unfair pricing in data markets poses a serious challenge for both data buyers (e.g., LLM builders) and sellers (e.g., human annotators), which discourages market participation, reducing data quantity and quality. In this paper, we propose a fairshare pricing framework that sets training data prices using data valuation methods to quantify their contribution to LLMs. In our framework, buyers make purchasing decisions using data valuation and sellers set prices to maximize their profits based on the anticipated buyer purchases. We theoretically show that pricing derived from our framework is tightly linked to data valuation and buyers' budget, optimal for both buyers and sellers. Through market simulations using current LLMs and datasets (math problems, medical diagnosis, and physical reasoning), we show that our framework is fairshare for buyers by ensuring their purchased data is reflective of model training value, leading to higher LLM task performances per-dollar spent on data, and fairshare for sellers by ensuring they sell their data at optimal prices. Our framework lays the foundation for future research on equitable and sustainable data markets for large-scale AI.
Related papers
- Data Measurements for Decentralized Data Markets [18.99870296998749]
Decentralized data markets can provide more equitable forms of data acquisition for machine learning.
We propose and benchmark federated data measurements to allow a data buyer to find sellers with relevant and diverse datasets.
arXiv Detail & Related papers (2024-06-06T17:03:51Z) - A Bargaining-based Approach for Feature Trading in Vertical Federated
Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions.
Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z) - An Auction-based Marketplace for Model Trading in Federated Learning [54.79736037670377]
Federated learning (FL) is increasingly recognized for its efficacy in training models using locally distributed data.
We frame FL as a marketplace of models, where clients act as both buyers and sellers.
We propose an auction-based solution to ensure proper pricing based on performance gain.
arXiv Detail & Related papers (2024-02-02T07:25:53Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - Dynamic Datasets and Market Environments for Financial Reinforcement
Learning [68.11692837240756]
FinRL-Meta is a library that processes dynamic datasets from real-world markets into gym-style market environments.
We provide examples and reproduce popular research papers as stepping stones for users to design new trading strategies.
We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance.
arXiv Detail & Related papers (2023-04-25T22:17:31Z) - A Survey of Data Pricing for Data Marketplaces [77.3189288320768]
This paper attempts to comprehensively review the state-of-the-art on existing data pricing studies.
Our key contribution lies in a new taxonomy of data pricing studies that unifies different attributes determining data prices.
arXiv Detail & Related papers (2023-03-07T04:35:56Z) - A Marketplace for Trading AI Models based on Blockchain and Incentives
for IoT Data [24.847898465750667]
An emerging paradigm in Machine Learning (ML) is a federated approach where the learning model is delivered to a group of heterogeneous agents partially, allowing agents to train the model locally with their own data.
The problem of valuation of models, as well as the questions of incentives for collaborative training and trading of data/models, have received limited treatment in the literature.
In this paper, a new ecosystem of ML model trading over a trusted ML-based network is proposed. The buyer can acquire the model of interest from the ML market, and interested sellers spend local computations on their data to enhance that model's quality
arXiv Detail & Related papers (2021-12-06T08:52:42Z) - What Is the Price of Data? A Measurement Study of Commercial Data
Marketplaces [0.0]
We present a first of its kind measurement study of the growing Data Marketplace ecosystem.
We show that the median price of live data products sold under a subscription model is around US$1,400 per month.
For one-off purchases of static data, the median price is around US$2,200.
arXiv Detail & Related papers (2021-10-25T10:39:47Z) - OSOUM Framework for Trading Data Research [79.0383470835073]
We supply, to the best of our knowledge, the first open source simulation platform, Open SOUrce Market Simulator (OSOUM) to analyze trading markets and specifically data markets.
We describe and implement a specific data market model, consisting of two types of agents: sellers who own various datasets available for acquisition, and buyers searching for relevant and beneficial datasets for purchase.
Although commercial frameworks, intended for handling data markets, already exist, we provide a free and extensive end-to-end research tool for simulating possible behavior for both buyers and sellers participating in (data) markets.
arXiv Detail & Related papers (2021-02-18T09:20:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.