Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use
Large Language Models for Text Production Tasks
- URL: http://arxiv.org/abs/2306.07899v1
- Date: Tue, 13 Jun 2023 16:46:24 GMT
- Title: Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use
Large Language Models for Text Production Tasks
- Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Robert West
- Abstract summary: Large language models (LLMs) are remarkable data annotators.
Crowdsourcing, an important, inexpensive way to obtain human annotations, may itself be impacted by LLMs.
We estimate that 33-46% of crowd workers used LLMs when completing a task.
- Score: 12.723777984461693
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are remarkable data annotators. They can be used
to generate high-fidelity supervised training data, as well as survey and
experimental data. With the widespread adoption of LLMs, human gold--standard
annotations are key to understanding the capabilities of LLMs and the validity
of their results. However, crowdsourcing, an important, inexpensive way to
obtain human annotations, may itself be impacted by LLMs, as crowd workers have
financial incentives to use LLMs to increase their productivity and income. To
investigate this concern, we conducted a case study on the prevalence of LLM
usage by crowd workers. We reran an abstract summarization task from the
literature on Amazon Mechanical Turk and, through a combination of keystroke
detection and synthetic text classification, estimate that 33-46% of crowd
workers used LLMs when completing the task. Although generalization to other,
less LLM-friendly tasks is unclear, our results call for platforms,
researchers, and crowd workers to find new ways to ensure that human data
remain human, perhaps using the methodology proposed here as a stepping stone.
Code/data: https://github.com/epfl-dlab/GPTurk
Related papers
- Can LLMs Replace Manual Annotation of Software Engineering Artifacts? [24.563167762241346]
Large language models (LLMs) have recently started to demonstrate human-level performance in several areas.
This paper explores the possibility of substituting costly human subjects with much cheaper LLM queries in evaluations of code and code-related artifacts.
Our results show that replacing some human annotation effort with LLMs can produce inter-rater agreements equal or close to human-rater agreement.
arXiv Detail & Related papers (2024-08-10T12:30:01Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach [0.0]
Large Language Models (LLMs) produce inaccurate outputs, also known as hallucinations.
This paper introduces a supervised learning approach employing only four numerical features derived from tokens and vocabulary probabilities obtained from other evaluators.
The method yields promising results, surpassing state-of-the-art outcomes in multiple tasks across three different benchmarks.
arXiv Detail & Related papers (2024-05-30T03:00:47Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Prevalence and prevention of large language model use in crowd work [11.554258761785512]
We show that the use of large language models (LLMs) is prevalent among crowd workers.
We show that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use.
arXiv Detail & Related papers (2023-10-24T09:52:09Z) - CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large
Language Models for Data Annotation [94.59630161324013]
We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale.
Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
arXiv Detail & Related papers (2023-10-24T08:56:49Z) - LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis [18.775126929754833]
Thematic analysis (TA) has been widely used for analyzing qualitative data in many disciplines and fields.
Human coders develop and deepen their data interpretation and coding over multiple iterations, making TA labor-intensive and time-consuming.
We propose a human-LLM collaboration framework (i.e., LLM-in-the-loop) to conduct TA with in-context learning (ICL)
arXiv Detail & Related papers (2023-10-23T17:05:59Z) - Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks.
Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information.
This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z) - Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data)
This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.