The Data-Production Dispositif
- URL: http://arxiv.org/abs/2205.11963v1
- Date: Tue, 24 May 2022 10:51:05 GMT
- Title: The Data-Production Dispositif
- Authors: Milagros Miceli and Julian Posada
- Abstract summary: This paper investigates outsourced machine learning data work in Latin America by studying three platforms in Venezuela and a BPO in Argentina.
We lean on the Foucauldian notion of dispositif to define the data-production dispositif as an ensemble of discourses, actions, and objects strategically disposed to (re)produce power/knowledge relations in data and labor.
We conclude by stressing the importance of counteracting the data-production dispositif by fighting alienation and precarization, and empowering data workers to become assets in the quest for high-quality data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) depends on data to train and verify models. Very often,
organizations outsource processes related to data work (i.e., generating and
annotating data and evaluating outputs) through business process outsourcing
(BPO) companies and crowdsourcing platforms. This paper investigates outsourced
ML data work in Latin America by studying three platforms in Venezuela and a
BPO in Argentina. We lean on the Foucauldian notion of dispositif to define the
data-production dispositif as an ensemble of discourses, actions, and objects
strategically disposed to (re)produce power/knowledge relations in data and
labor. Our dispositif analysis comprises the examination of 210 data work
instruction documents, 55 interviews with data workers, managers, and
requesters, and participant observation. Our findings show that discourses
encoded in instructions reproduce and normalize the worldviews of requesters.
Precarious working conditions and economic dependency alienate workers, making
them obedient to instructions. Furthermore, discourses and social contexts
materialize in artifacts, such as interfaces and performance metrics, limiting
workers' agency and normalizing specific ways of interpreting data. We conclude
by stressing the importance of counteracting the data-production dispositif by
fighting alienation and precarization, and empowering data workers to become
assets in the quest for high-quality data.
Related papers
- Global Inequalities in the Production of Artificial Intelligence: A Four-Country Study on Data Work [0.0]
Labor plays a major, albeit largely unrecognized role in the development of artificial intelligence.
Online platforms and networks of subcontractors recruit data workers to execute tasks in the shadow of AI production.
This study unveils the resulting complexities by comparing the working conditions and the profiles of data workers in Venezuela, Brazil, Madagascar, and as an example of a richer country, France.
arXiv Detail & Related papers (2024-10-18T07:23:17Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Computational Job Market Analysis with Natural Language Processing [5.117211717291377]
This thesis investigates Natural Language Processing (NLP) technology for extracting relevant information from job descriptions.
We frame the problem, obtaining annotated data, and introducing extraction methodologies.
Our contributions include job description datasets, a de-identification dataset, and a novel active learning algorithm for efficient model training.
arXiv Detail & Related papers (2024-04-29T14:52:38Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Capture the Flag: Uncovering Data Insights with Large Language Models [90.47038584812925]
This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data.
We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset.
arXiv Detail & Related papers (2023-12-21T14:20:06Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - Wisdom for the Crowd: Discoursive Power in Annotation Instructions for
Computer Vision [0.0]
This paper focuses on the experiences of Argentinian and Venezuelan data annotators.
Our findings indicate that annotation instructions reflect worldviews imposed on workers and, through their labor, on datasets.
This configuration presents a form of commodified labor that perpetuates power asymmetries while reinforcing social inequalities.
arXiv Detail & Related papers (2021-05-23T18:20:39Z) - DataOps for Societal Intelligence: a Data Pipeline for Labor Market
Skills Extraction and Matching [5.842787579447653]
We formulate and solve this problem using DataOps models.
We then focus on the critical task of skills extraction from resumes.
We showcase preliminary results with applied machine learning on real data.
arXiv Detail & Related papers (2021-04-05T15:37:25Z) - Bringing the People Back In: Contesting Benchmark Machine Learning
Datasets [11.00769651520502]
We outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created.
We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets.
arXiv Detail & Related papers (2020-07-14T23:22:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.