Reclaiming the Digital Commons: A Public Data Trust for Training Data
- URL: http://arxiv.org/abs/2303.09001v2
- Date: Sun, 21 May 2023 23:17:19 GMT
- Title: Reclaiming the Digital Commons: A Public Data Trust for Training Data
- Authors: Alan Chan, Herbie Bradley, Nitarshan Rajkumar
- Abstract summary: We propose that a public data trust assert control over training data for foundation models.
This trust should scrape the internet as a digital commons, to license to commercial model developers for a percentage cut of revenues from deployment.
- Score: 2.36052383261568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Democratization of AI means not only that people can freely use AI, but also
that people can collectively decide how AI is to be used. In particular,
collective decision-making power is required to redress the negative
externalities from the development of increasingly advanced AI systems,
including degradation of the digital commons and unemployment from automation.
The rapid pace of AI development and deployment currently leaves little room
for this power. Monopolized in the hands of private corporations, the
development of the most capable foundation models has proceeded largely without
public input. There is currently no implemented mechanism for ensuring that the
economic value generated by such models is redistributed to account for their
negative externalities. The citizens that have generated the data necessary to
train models do not have input on how their data are to be used. In this work,
we propose that a public data trust assert control over training data for
foundation models. In particular, this trust should scrape the internet as a
digital commons, to license to commercial model developers for a percentage cut
of revenues from deployment. First, we argue in detail for the existence of
such a trust. We also discuss feasibility and potential risks. Second, we
detail a number of ways for a data trust to incentivize model developers to use
training data only from the trust. We propose a mix of verification mechanisms,
potential regulatory action, and positive incentives. We conclude by
highlighting other potential benefits of our proposed data trust and connecting
our work to ongoing efforts in data and compute governance.
Related papers
- Decentralized Intelligence Network (DIN) [0.0]
Decentralized Intelligence Network (DIN) addresses the challenges of data sovereignty and AI utilization caused by the fragmentation and siloing of data across providers and institutions.
This comprehensive framework overcomes access barriers to scalable data sources.
It supports effective AI training, allowing participants to maintain control over their data, benefit financially, and contribute to a decentralized, scalable ecosystem.
arXiv Detail & Related papers (2024-07-02T17:40:06Z) - An Economic Solution to Copyright Challenges of Generative AI [35.37023083413299]
Generative artificial intelligence systems are trained to generate new pieces of text, images, videos, and other media.
There is growing concern that such systems may infringe on the copyright interests of training data contributors.
We propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content.
arXiv Detail & Related papers (2024-04-22T08:10:38Z) - Trustless Audits without Revealing Data or Models [49.23322187919369]
We show that it is possible to allow model providers to keep their model weights (but not architecture) and data secret while allowing other parties to trustlessly audit model and data properties.
We do this by designing a protocol called ZkAudit in which model providers publish cryptographic commitments of datasets and model weights.
arXiv Detail & Related papers (2024-04-06T04:43:06Z) - Computing Power and the Governance of Artificial Intelligence [51.967584623262674]
Governments and companies have started to leverage compute as a means to govern AI.
compute-based policies and technologies have the potential to assist in these areas, but there is significant variation in their readiness for implementation.
naive or poorly scoped approaches to compute governance carry significant risks in areas like privacy, economic impacts, and centralization of power.
arXiv Detail & Related papers (2024-02-13T21:10:21Z) - Mechanisms that Incentivize Data Sharing in Federated Learning [90.74337749137432]
We show how a naive scheme leads to catastrophic levels of free-riding where the benefits of data sharing are completely eroded.
We then introduce accuracy shaping based mechanisms to maximize the amount of data generated by each agent.
arXiv Detail & Related papers (2022-07-10T22:36:52Z) - APPFLChain: A Privacy Protection Distributed Artificial-Intelligence
Architecture Based on Federated Learning and Consortium Blockchain [6.054775780656853]
We propose a new system architecture called APPFLChain.
It is an integrated architecture of a Hyperledger Fabric-based blockchain and a federated-learning paradigm.
Our new system can maintain a high degree of security and privacy as users do not need to share sensitive personal information to the server.
arXiv Detail & Related papers (2022-06-26T05:30:07Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - Decentralized Federated Learning Preserves Model and Data Privacy [77.454688257702]
We propose a fully decentralized approach, which allows to share knowledge between trained models.
Students are trained on the output of their teachers via synthetically generated input data.
The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher.
arXiv Detail & Related papers (2021-02-01T14:38:54Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z) - A Distributed Trust Framework for Privacy-Preserving Machine Learning [4.282091426377838]
This paper outlines a distributed infrastructure which is used to facilitate peer-to-peer trust between distributed agents.
We detail a proof of concept using Hyperledger Aries, Decentralised Identifiers (DIDs) and Verifiable Credentials (VCs)
arXiv Detail & Related papers (2020-06-03T18:06:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.