Generative AI and the Digital Commons
- URL: http://arxiv.org/abs/2303.11074v1
- Date: Mon, 20 Mar 2023 13:01:48 GMT
- Title: Generative AI and the Digital Commons
- Authors: Saffron Huang and Divya Siddarth
- Abstract summary: GFMs are trained on publicly available data and use public infrastructure.
We outline the risks posed by GFMs and why they are relevant to the digital commons.
We propose numerous governance-based solutions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many generative foundation models (or GFMs) are trained on publicly available
data and use public infrastructure, but 1) may degrade the "digital commons"
that they depend on, and 2) do not have processes in place to return value
captured to data producers and stakeholders. Existing conceptions of data
rights and protection (focusing largely on individually-owned data and
associated privacy concerns) and copyright or licensing-based models offer some
instructive priors, but are ill-suited for the issues that may arise from
models trained on commons-based data. We outline the risks posed by GFMs and
why they are relevant to the digital commons, and propose numerous
governance-based solutions that include investments in standardized
dataset/model disclosure and other kinds of transparency when it comes to
generative models' training and capabilities, consortia-based funding for
monitoring/standards/auditing organizations, requirements or norms for GFM
companies to contribute high quality data to the commons, and structures for
shared ownership based on individual or community provision of fine-tuning
data.
Related papers
- Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them? [11.040101172803727]
New capabilities in foundation models are owed in large part to massive, widely-sourced, and under-documented training data collections.
Existing practices in data collection have led to challenges in tracing authenticity, verifying consent, preserving privacy, addressing representation and bias, respecting copyright, and overall developing ethical and trustworthy foundation models.
arXiv Detail & Related papers (2024-04-19T07:42:35Z) - CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources [5.898893619901382]
We propose a framework for the collaborative and private generation of synthetic data from distributed data holders.
We replace the trusted aggregator with secure multi-party computation protocols and output privacy via differential privacy (DP)
We demonstrate the applicability and scalability of our approach for the state-of-the-art select-measure-generate algorithms MWEM+PGM and AIM.
arXiv Detail & Related papers (2024-02-13T17:26:32Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - Sotto Voce: Federated Speech Recognition with Differential Privacy
Guarantees [0.761963751158349]
Speech data is expensive to collect, and incredibly sensitive to its sources.
It is often the case that organizations independently collect small datasets for their own use, but often these are not performant for the demands of machine learning.
Organizations could pool these datasets together and jointly build a strong ASR system; sharing data in the clear, however, comes with tremendous risk, in terms of intellectual property loss as well as loss of privacy of the individuals who exist in the dataset.
arXiv Detail & Related papers (2022-07-16T02:48:54Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - Data Sharing Markets [95.13209326119153]
We study a setup where each agent can be both buyer and seller of data.
We consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money)
arXiv Detail & Related papers (2021-07-19T06:00:34Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure
Dataset Release [52.504589728136615]
We develop a data poisoning method by which publicly released data can be minimally modified to prevent others from train-ing models on it.
We demonstrate the success of our approach onImageNet classification and on facial recognition.
arXiv Detail & Related papers (2021-02-16T19:12:34Z) - Second layer data governance for permissioned blockchains: the privacy
management challenge [58.720142291102135]
In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data is crucial to avoid the massive infection and decrease the number of deaths.
In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts.
arXiv Detail & Related papers (2020-10-22T13:19:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.