Prevalence and prevention of large language model use in crowd work
- URL: http://arxiv.org/abs/2310.15683v1
- Date: Tue, 24 Oct 2023 09:52:09 GMT
- Title: Prevalence and prevention of large language model use in crowd work
- Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Philip Cozzolino, Andrew
Gordon, David Rothschild, Robert West
- Abstract summary: We show that the use of large language models (LLMs) is prevalent among crowd workers.
We show that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use.
- Score: 11.554258761785512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that the use of large language models (LLMs) is prevalent among crowd
workers, and that targeted mitigation strategies can significantly reduce, but
not eliminate, LLM use. On a text summarization task where workers were not
directed in any way regarding their LLM use, the estimated prevalence of LLM
use was around 30%, but was reduced by about half by asking workers to not use
LLMs and by raising the cost of using them, e.g., by disabling copy-pasting.
Secondary analyses give further insight into LLM use and its prevention: LLM
use yields high-quality but homogeneous responses, which may harm research
concerned with human (rather than model) behavior and degrade future models
trained with crowdsourced data. At the same time, preventing LLM use may be at
odds with obtaining high-quality responses; e.g., when requesting workers not
to use LLMs, summaries contained fewer keywords carrying essential information.
Our estimates will likely change as LLMs increase in popularity or
capabilities, and as norms around their usage change. Yet, understanding the
co-evolution of LLM-based tools and users is key to maintaining the validity of
research done using crowdsourcing, and we provide a critical baseline before
widespread adoption ensues.
Related papers
- The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? [60.01746782465275]
Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks.
This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership.
arXiv Detail & Related papers (2024-10-07T02:30:18Z) - SNAP: Unlearning Selective Knowledge in Large Language Models with Negative Instructions [37.172662930947446]
Instruction-following large language models (LLMs) inadvertently disclose personal or copyrighted information.
We propose SNAP, an innovative framework designed to selectively unlearn information.
We evaluate our framework on various NLP benchmarks and demonstrate that our approach retains the original LLM capabilities.
arXiv Detail & Related papers (2024-06-18T06:54:05Z) - CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs)
Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs.
Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z) - Locally Differentially Private In-Context Learning [8.659575019965152]
Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability.
This paper proposes a locally differentially private framework of in-context learning (LDP-ICL)
Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL.
arXiv Detail & Related papers (2024-05-07T06:05:43Z) - Purifying Large Language Models by Ensembling a Small Language Model [39.57304668057076]
We propose a simple and easily implementable method for purifying LLMs from the negative effects caused by uncurated data.
We empirically confirm the efficacy of ensembling LLMs with benign and small language models (SLMs)
arXiv Detail & Related papers (2024-02-19T14:00:39Z) - Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in large language models (LLMs) with a slim proxy model.
We employ a proxy model which has far fewer parameters, and take its answers as answers.
Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM.
arXiv Detail & Related papers (2024-02-19T11:11:08Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use
Large Language Models for Text Production Tasks [12.723777984461693]
Large language models (LLMs) are remarkable data annotators.
Crowdsourcing, an important, inexpensive way to obtain human annotations, may itself be impacted by LLMs.
We estimate that 33-46% of crowd workers used LLMs when completing a task.
arXiv Detail & Related papers (2023-06-13T16:46:24Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z) - Do LLMs Understand User Preferences? Evaluating LLMs On User Rating
Prediction [15.793007223588672]
Large Language Models (LLMs) have demonstrated exceptional capabilities in generalizing to new tasks in a zero-shot or few-shot manner.
We investigate various LLMs in different sizes, ranging from 250M to 540B parameters and evaluate their performance in zero-shot, few-shot, and fine-tuning scenarios.
arXiv Detail & Related papers (2023-05-10T21:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.