Related papers: Fundamentals of Task-Agnostic Data Valuation

Fundamentals of Task-Agnostic Data Valuation

URL: http://arxiv.org/abs/2208.12354v1
Date: Thu, 25 Aug 2022 22:07:07 GMT
Title: Fundamentals of Task-Agnostic Data Valuation
Authors: Mohammad Mohammadi Amiri, Frederic Berdoz, Ramesh Raskar
Abstract summary: We study valuing the data of a data owner/seller for a data seeker/buyer. We focus on task-agnostic data valuation without any validation requirements.
Score: 21.78555506720078
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study valuing the data of a data owner/seller for a data seeker/buyer. Data valuation is often carried out for a specific task assuming a particular utility metric, such as test accuracy on a validation set, that may not exist in practice. In this work, we focus on task-agnostic data valuation without any validation requirements. The data buyer has access to a limited amount of data (which could be publicly available) and seeks more data samples from a data seller. We formulate the problem as estimating the differences in the statistical properties of the data at the seller with respect to the baseline data available at the buyer. We capture these statistical differences through second moment by measuring diversity and relevance of the seller's data for the buyer; we estimate these measures through queries to the seller without requesting raw data. We design the queries with the proposed approach so that the seller is blind to the buyer's raw data and has no knowledge to fabricate responses to queries to obtain a desired outcome of the diversity and relevance trade-off.We will show through extensive experiments on real tabular and image datasets that the proposed estimates capture the diversity and relevance of the seller's data for the buyer.

Related papers

An Instrumental Value for Data Production and its Application to Data Pricing [107.98697414652479]
This paper develops an approach for capturing the instrumental value of data production processes. We show how they connect to classic notions of information design and signals in information economics.
arXiv Detail & Related papers (2024-12-24T03:53:57Z)
A Survey on Data Markets [73.07800441775814]
Growing trend of trading data for greater welfare has led to the emergence of data markets. A data market is any mechanism whereby the exchange of data products including datasets and data derivatives takes place. It serves as a coordinating mechanism by which several functions, including the pricing and the distribution of data, interact.
arXiv Detail & Related papers (2024-11-09T15:09:24Z)
Private, Augmentation-Robust and Task-Agnostic Data Valuation Approach for Data Marketplace [56.78396861508909]
PriArTa is an approach for computing the distance between the distribution of the buyer's existing dataset and the seller's dataset. PriArTa is communication-efficient, enabling the buyer to evaluate datasets without needing access to the entire dataset from each seller.
arXiv Detail & Related papers (2024-11-01T17:13:14Z)
Data Distribution Valuation [56.71023681599737]
Existing data valuation methods define a value for a discrete dataset. In many use cases, users are interested in not only the value of the dataset, but that of the distribution from which the dataset was sampled. We propose a maximum mean discrepancy (MMD)-based valuation method which enables theoretically principled and actionable policies.
arXiv Detail & Related papers (2024-10-06T07:56:53Z)
Data Measurements for Decentralized Data Markets [18.99870296998749]
Decentralized data markets can provide more equitable forms of data acquisition for machine learning. We propose and benchmark federated data measurements to allow a data buyer to find sellers with relevant and diverse datasets.
arXiv Detail & Related papers (2024-06-06T17:03:51Z)
Preventive Audits for Data Applications Before Data Sharing in the Power IoT [4.899053698192078]
Data owners should conduct preventive audits for data applications before data sharing. Data sharing in the power IoT is regarded as the background. preventive audits should be conducted based on changes in the data feature parameters before and after data sharing.
arXiv Detail & Related papers (2024-05-05T15:07:56Z)
DAVED: Data Acquisition via Experimental Design for Data Markets [25.300193837833426]
We propose a federated approach to the data acquisition problem that is inspired by linear experimental design. Our proposed data acquisition method achieves lower prediction error without requiring labeled validation data. The key insight of our work is that a method that directly estimates the benefit of acquiring data for test set prediction is particularly compatible with a decentralized market setting.
arXiv Detail & Related papers (2024-03-20T18:05:52Z)
A Survey of Data Pricing for Data Marketplaces [77.3189288320768]
This paper attempts to comprehensively review the state-of-the-art on existing data pricing studies. Our key contribution lies in a new taxonomy of data pricing studies that unifies different attributes determining data prices.
arXiv Detail & Related papers (2023-03-07T04:35:56Z)
Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics [58.50754318846996]
In this paper, we show that the performances of metrics are sensitive to data. The ranking of metrics varies when the evaluation is conducted on different datasets.
arXiv Detail & Related papers (2022-03-29T18:58:28Z)
Data-SUITE: Data-centric identification of in-distribution incongruous examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data. We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.