Resolving the Human Subjects Status of Machine Learning's Crowdworkers
- URL: http://arxiv.org/abs/2206.04039v2
- Date: Thu, 15 Jun 2023 20:10:08 GMT
- Title: Resolving the Human Subjects Status of Machine Learning's Crowdworkers
- Authors: Divyansh Kaushik, Zachary C. Lipton, Alex John London
- Abstract summary: We investigate the appropriate designation of ML crowdsourcing studies.
We highlight two challenges posed by ML: the same set of workers can serve multiple roles and provide many sorts of information.
Our analysis exposes a potential loophole in the Common Rule, where researchers can elude research ethics oversight by splitting data collection and analysis into distinct studies.
- Score: 29.008050084395958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, machine learning (ML) has relied heavily on crowdworkers
both for building datasets and for addressing research questions requiring
human interaction or judgment. The diverse tasks performed and uses of the data
produced render it difficult to determine when crowdworkers are best thought of
as workers (versus human subjects). These difficulties are compounded by
conflicting policies, with some institutions and researchers regarding all ML
crowdworkers as human subjects and others holding that they rarely constitute
human subjects. Notably few ML papers involving crowdwork mention IRB
oversight, raising the prospect of non-compliance with ethical and regulatory
requirements. We investigate the appropriate designation of ML crowdsourcing
studies, focusing our inquiry on natural language processing to expose unique
challenges for research oversight. Crucially, under the U.S. Common Rule, these
judgments hinge on determinations of aboutness, concerning both whom (or what)
the collected data is about and whom (or what) the analysis is about. We
highlight two challenges posed by ML: the same set of workers can serve
multiple roles and provide many sorts of information; and ML research tends to
embrace a dynamic workflow, where research questions are seldom stated ex ante
and data sharing opens the door for future studies to aim questions at
different targets. Our analysis exposes a potential loophole in the Common
Rule, where researchers can elude research ethics oversight by splitting data
collection and analysis into distinct studies. Finally, we offer several policy
recommendations to address these concerns.
Related papers
- The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? [60.01746782465275]
Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks.
This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership.
arXiv Detail & Related papers (2024-10-07T02:30:18Z) - Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models [29.086329448754412]
We present a comprehensive review by categorizing current studies into three research problems: self-assessment, exhibition, and recognition.
Our paper is the first comprehensive survey of up-to-date literature on personality in large language models.
arXiv Detail & Related papers (2024-06-25T15:08:44Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries [91.70689724416698]
We present NatQuest, a collection of 13,500 naturally occurring questions from three diverse sources.
Our analysis reveals a significant presence of causal questions (up to 42%) within the dataset.
arXiv Detail & Related papers (2024-05-30T17:55:28Z) - Automating Thematic Analysis: How LLMs Analyse Controversial Topics [5.025737475817937]
Large Language Models (LLMs) are promising analytical tools.
This paper explores how LLMs can support thematic analysis of controversial topics.
Our findings highlight intriguing overlaps and variances in thematic categorisation between human and machine agents.
arXiv Detail & Related papers (2024-05-11T05:28:25Z) - Factuality of Large Language Models: A Survey [29.557596701431827]
We critically analyze existing work with the aim to identify the major challenges and their associated causes.
We analyze the obstacles to automated factuality evaluation for open-ended text generation.
arXiv Detail & Related papers (2024-02-04T09:36:31Z) - Responsible AI Considerations in Text Summarization Research: A Review
of Current Practices [89.85174013619883]
We focus on text summarization, a common NLP task largely overlooked by the responsible AI community.
We conduct a multi-round qualitative analysis of 333 summarization papers from the ACL Anthology published between 2020-2022.
We focus on how, which, and when responsible AI issues are covered, which relevant stakeholders are considered, and mismatches between stated and realized research goals.
arXiv Detail & Related papers (2023-11-18T15:35:36Z) - MoCa: Measuring Human-Language Model Alignment on Causal and Moral
Judgment Tasks [49.60689355674541]
A rich literature in cognitive science has studied people's causal and moral intuitions.
This work has revealed a number of factors that systematically influence people's judgments.
We test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with human participants.
arXiv Detail & Related papers (2023-10-30T15:57:32Z) - The ethical ambiguity of AI data enrichment: Measuring gaps in research
ethics norms and practices [2.28438857884398]
This study explores how, and to what extent, comparable research ethics requirements and norms have developed for AI research and data enrichment.
Leading AI venues have begun to establish protocols for human data collection, but these are are inconsistently followed by authors.
arXiv Detail & Related papers (2023-06-01T16:12:55Z) - Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments.
It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z) - The COVID-19 Infodemic: Can the Crowd Judge Recent Misinformation
Objectively? [17.288917654501265]
We study whether crowdsourcing is an effective and reliable method to assess statements truthfulness during a pandemic.
We specifically target statements related to the COVID-19 health emergency, that is still ongoing at the time of the study.
In our experiment, crowd workers are asked to assess the truthfulness of statements, as well as to provide evidence for the assessments as a URL and a text justification.
arXiv Detail & Related papers (2020-08-13T05:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.