The Dimensions of Data Labor: A Road Map for Researchers, Activists, and
Policymakers to Empower Data Producers
- URL: http://arxiv.org/abs/2305.13238v1
- Date: Mon, 22 May 2023 17:11:22 GMT
- Title: The Dimensions of Data Labor: A Road Map for Researchers, Activists, and
Policymakers to Empower Data Producers
- Authors: Hanlin Li, Nicholas Vincent, Stevie Chancellor, Brent Hecht
- Abstract summary: Data producers have little say in what data is captured, how it is used, or who it benefits.
Organizations with the ability to access and process this data, e.g. OpenAI and Google, possess immense power in shaping the technology landscape.
By synthesizing related literature that reconceptualizes the production of data for computing as data labor'', we outline opportunities for researchers, policymakers, and activists to empower data producers.
- Score: 14.392208044851976
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many recent technological advances (e.g. ChatGPT and search engines) are
possible only because of massive amounts of user-generated data produced
through user interactions with computing systems or scraped from the web (e.g.
behavior logs, user-generated content, and artwork). However, data producers
have little say in what data is captured, how it is used, or who it benefits.
Organizations with the ability to access and process this data, e.g. OpenAI and
Google, possess immense power in shaping the technology landscape. By
synthesizing related literature that reconceptualizes the production of data
for computing as ``data labor'', we outline opportunities for researchers,
policymakers, and activists to empower data producers in their relationship
with tech companies, e.g advocating for transparency about data reuse, creating
feedback channels between data producers and companies, and potentially
developing mechanisms to share data's revenue more broadly. In doing so, we
characterize data labor with six important dimensions - legibility, end-use
awareness, collaboration requirement, openness, replaceability, and livelihood
overlap - based on the parallels between data labor and various other types of
labor in the computing literature.
Related papers
- Efficient Data Collection for Robotic Manipulation via Compositional Generalization [70.76782930312746]
We show that policies can compose environmental factors from their data to succeed when encountering unseen factor combinations.
We propose better in-domain data collection strategies that exploit composition.
We provide videos at http://iliad.stanford.edu/robot-data-comp/.
arXiv Detail & Related papers (2024-03-08T07:15:38Z) - From Data Creator to Data Reuser: Distance Matters [0.847136673632881]
Open science policies focus more heavily on data sharing than on reuse.
The value of data reuse lies in relationships between creators and reusers.
We develop the theoretical construct of distance between data creator and data reuser.
arXiv Detail & Related papers (2024-02-05T18:16:04Z) - Privacy-Preserving Graph Machine Learning from Data to Computation: A
Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning.
We first review methods for generating privacy-preserving graph data.
Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances [76.34037366117234]
We introduce a new dataset called Robot Control Gestures (RoCoG-v2)
The dataset is composed of both real and synthetic videos from seven gesture classes.
We present results using state-of-the-art action recognition and domain adaptation algorithms.
arXiv Detail & Related papers (2023-03-17T23:23:55Z) - Machine Learning for Synthetic Data Generation: A Review [23.073056971997715]
This paper reviews existing studies that employ machine learning models for the purpose of generating synthetic data.
The review encompasses various perspectives, starting with the applications of synthetic data generation, spanning computer vision, speech, natural language processing, healthcare, and business domains.
The paper also addresses the crucial aspects of privacy and fairness concerns related to synthetic data generation.
arXiv Detail & Related papers (2023-02-08T13:59:31Z) - Privacy-Preserving Machine Learning for Collaborative Data Sharing via
Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task.
This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z) - The Role of Cross-Silo Federated Learning in Facilitating Data Sharing
in the Agri-Food Sector [5.219568203653523]
Data sharing remains a major hindering factor when it comes to adopting emerging AI technologies in the agri-food sector.
We propose a technical solution based on federated learning that uses decentralized data.
Our results demonstrate that our approach performs better than each of the models trained on an individual data source.
arXiv Detail & Related papers (2021-04-14T16:00:28Z) - Data Leverage: A Framework for Empowering the Public in its Relationship
with Technology Companies [13.174512123890015]
Many powerful computing technologies rely on implicit and explicit data contributions from the public.
This dependency suggests a potential source of leverage for the public in its relationship with technology companies.
We present a framework for understanding data leverage that highlights new opportunities to change technology company behavior.
arXiv Detail & Related papers (2020-12-18T00:46:26Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z) - From Data to Knowledge to Action: A Global Enabler for the 21st Century [26.32590947516587]
A confluence of advances in the computer and mathematical sciences has unleashed unprecedented capabilities for enabling true evidence-based decision making.
These capabilities are making possible the large-scale capture of data and the transformation of that data into insights and recommendations.
The shift of commerce, science, education, art, and entertainment to the web makes available unprecedented quantities of structured and unstructured databases about human activities.
arXiv Detail & Related papers (2020-07-31T19:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.