Fundamentals of Task-Agnostic Data Valuation
- URL: http://arxiv.org/abs/2208.12354v1
- Date: Thu, 25 Aug 2022 22:07:07 GMT
- Title: Fundamentals of Task-Agnostic Data Valuation
- Authors: Mohammad Mohammadi Amiri, Frederic Berdoz, Ramesh Raskar
- Abstract summary: We study valuing the data of a data owner/seller for a data seeker/buyer.
We focus on task-agnostic data valuation without any validation requirements.
- Score: 21.78555506720078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study valuing the data of a data owner/seller for a data seeker/buyer.
Data valuation is often carried out for a specific task assuming a particular
utility metric, such as test accuracy on a validation set, that may not exist
in practice. In this work, we focus on task-agnostic data valuation without any
validation requirements. The data buyer has access to a limited amount of data
(which could be publicly available) and seeks more data samples from a data
seller. We formulate the problem as estimating the differences in the
statistical properties of the data at the seller with respect to the baseline
data available at the buyer. We capture these statistical differences through
second moment by measuring diversity and relevance of the seller's data for the
buyer; we estimate these measures through queries to the seller without
requesting raw data. We design the queries with the proposed approach so that
the seller is blind to the buyer's raw data and has no knowledge to fabricate
responses to queries to obtain a desired outcome of the diversity and relevance
trade-off.We will show through extensive experiments on real tabular and image
datasets that the proposed estimates capture the diversity and relevance of the
seller's data for the buyer.
Related papers
- Data Measurements for Decentralized Data Markets [18.99870296998749]
Decentralized data markets can provide more equitable forms of data acquisition for machine learning.
We propose and benchmark federated data measurements to allow a data buyer to find sellers with relevant and diverse datasets.
arXiv Detail & Related papers (2024-06-06T17:03:51Z) - Preventive Audits for Data Applications Before Data Sharing in the Power IoT [4.899053698192078]
Data owners should conduct preventive audits for data applications before data sharing.
Data sharing in the power IoT is regarded as the background.
preventive audits should be conducted based on changes in the data feature parameters before and after data sharing.
arXiv Detail & Related papers (2024-05-05T15:07:56Z) - Lazy Data Practices Harm Fairness Research [49.02318458244464]
We present a comprehensive analysis of fair ML datasets, demonstrating how unreflective practices hinder the reach and reliability of algorithmic fairness findings.
Our analyses identify three main areas of concern: (1) a textbflack of representation for certain protected attributes in both data and evaluations; (2) the widespread textbf of minorities during data preprocessing; and (3) textbfopaque data processing threatening the generalization of fairness research.
This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
arXiv Detail & Related papers (2024-04-26T09:51:24Z) - Data Acquisition via Experimental Design for Decentralized Data Markets [25.300193837833426]
Data markets provide a way to increase the supply of data, particularly in data-scarce domains such as healthcare.
A major challenge for a data buyer in such a market is selecting the most valuable data points from a data seller.
We propose a federated approach to the data selection problem that is inspired by linear experimental design.
arXiv Detail & Related papers (2024-03-20T18:05:52Z) - A Bargaining-based Approach for Feature Trading in Vertical Federated
Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions.
Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - A Survey of Data Pricing for Data Marketplaces [77.3189288320768]
This paper attempts to comprehensively review the state-of-the-art on existing data pricing studies.
Our key contribution lies in a new taxonomy of data pricing studies that unifies different attributes determining data prices.
arXiv Detail & Related papers (2023-03-07T04:35:56Z) - Investigating Data Variance in Evaluations of Automatic Machine
Translation Metrics [58.50754318846996]
In this paper, we show that the performances of metrics are sensitive to data.
The ranking of metrics varies when the evaluation is conducted on different datasets.
arXiv Detail & Related papers (2022-03-29T18:58:28Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Data Appraisal Without Data Sharing [28.41079503636652]
We develop methods that do not require data sharing by using secure multi-party computation.
Our experiments show that influence functions provide an appealing trade-off between high-quality appraisal and required computation.
arXiv Detail & Related papers (2020-12-11T15:45:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.