Best Practices for Text Annotation with Large Language Models
- URL: http://arxiv.org/abs/2402.05129v1
- Date: Mon, 5 Feb 2024 15:43:50 GMT
- Title: Best Practices for Text Annotation with Large Language Models
- Authors: Petter T\"ornberg
- Abstract summary: Large Language Models (LLMs) have ushered in a new era of text annotation.
This paper proposes a comprehensive set of standards and best practices for their reliable, reproducible, and ethical use.
- Score: 11.421942894219901
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have ushered in a new era of text annotation, as
their ease-of-use, high accuracy, and relatively low costs have meant that
their use has exploded in recent months. However, the rapid growth of the field
has meant that LLM-based annotation has become something of an academic Wild
West: the lack of established practices and standards has led to concerns about
the quality and validity of research. Researchers have warned that the
ostensible simplicity of LLMs can be misleading, as they are prone to bias,
misunderstandings, and unreliable results. Recognizing the transformative
potential of LLMs, this paper proposes a comprehensive set of standards and
best practices for their reliable, reproducible, and ethical use. These
guidelines span critical areas such as model selection, prompt engineering,
structured prompting, prompt stability analysis, rigorous model validation, and
the consideration of ethical and legal implications. The paper emphasizes the
need for a structured, directed, and formalized approach to using LLMs, aiming
to ensure the integrity and robustness of text annotation practices, and
advocates for a nuanced and critical engagement with LLMs in social scientific
research.
Related papers
- Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness [30.632260870411177]
Large language models (LLMs) have rapidly penetrated into people's work and daily lives over the past few years.
This thesis focuses on the correctness, non-toxicity, and fairness of LLMs from both software testing and natural language processing perspectives.
arXiv Detail & Related papers (2024-08-31T22:21:04Z) - To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity [27.10502683001428]
This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities.
Experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts.
arXiv Detail & Related papers (2024-07-24T09:48:48Z) - CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs)
Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs.
Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - From Understanding to Utilization: A Survey on Explainability for Large
Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs)
Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity.
When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Alignment for Honesty [105.72465407518325]
Recent research has made significant strides in aligning large language models (LLMs) with helpfulness and harmlessness.
In this paper, we argue for the importance of alignment for emphhonesty, ensuring that LLMs proactively refuse to answer questions when they lack knowledge.
We address these challenges by first establishing a precise problem definition and defining honesty'' inspired by the Analects of Confucius.
arXiv Detail & Related papers (2023-12-12T06:10:42Z) - FFT: Towards Harmlessness Evaluation and Analysis for LLMs with
Factuality, Fairness, Toxicity [21.539026782010573]
The widespread of generative artificial intelligence has heightened concerns about the potential harms posed by AI-generated texts.
Previous researchers have invested much effort in assessing the harmlessness of generative language models.
arXiv Detail & Related papers (2023-11-30T14:18:47Z) - Are Large Language Models Reliable Judges? A Study on the Factuality
Evaluation Capabilities of LLMs [8.526956860672698]
Large Language Models (LLMs) have gained immense attention due to their notable emergent capabilities.
This study investigates the potential of LLMs as reliable assessors of factual consistency in summaries generated by text-generation models.
arXiv Detail & Related papers (2023-11-01T17:42:45Z) - Large Language Models Cannot Self-Correct Reasoning Yet [78.16697476530994]
Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities.
Concerns persist regarding the accuracy and appropriateness of their generated content.
A contemporary methodology, self-correction, has been proposed as a remedy to these issues.
arXiv Detail & Related papers (2023-10-03T04:56:12Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.