Best Practices for Text Annotation with Large Language Models
- URL: http://arxiv.org/abs/2402.05129v1
- Date: Mon, 5 Feb 2024 15:43:50 GMT
- Title: Best Practices for Text Annotation with Large Language Models
- Authors: Petter T\"ornberg
- Abstract summary: Large Language Models (LLMs) have ushered in a new era of text annotation.
This paper proposes a comprehensive set of standards and best practices for their reliable, reproducible, and ethical use.
- Score: 11.421942894219901
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have ushered in a new era of text annotation, as
their ease-of-use, high accuracy, and relatively low costs have meant that
their use has exploded in recent months. However, the rapid growth of the field
has meant that LLM-based annotation has become something of an academic Wild
West: the lack of established practices and standards has led to concerns about
the quality and validity of research. Researchers have warned that the
ostensible simplicity of LLMs can be misleading, as they are prone to bias,
misunderstandings, and unreliable results. Recognizing the transformative
potential of LLMs, this paper proposes a comprehensive set of standards and
best practices for their reliable, reproducible, and ethical use. These
guidelines span critical areas such as model selection, prompt engineering,
structured prompting, prompt stability analysis, rigorous model validation, and
the consideration of ethical and legal implications. The paper emphasizes the
need for a structured, directed, and formalized approach to using LLMs, aiming
to ensure the integrity and robustness of text annotation practices, and
advocates for a nuanced and critical engagement with LLMs in social scientific
research.
Related papers
- Behavioral Testing: Can Large Language Models Implicitly Resolve Ambiguous Entities? [27.10502683001428]
We analyze current state-of-the-art large language models (LLMs) for their proficiency and consistency in applying their factual knowledge when prompted for entities under ambiguity.
Experiments reveal that LLMs perform poorly with ambiguous prompts, achieving only 80% accuracy.
arXiv Detail & Related papers (2024-07-24T09:48:48Z) - CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs)
Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs.
Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Can Large Language Models Identify Authorship? [18.378744138365537]
Large Language Models (LLMs) have demonstrated exceptional capacity for reasoning and problem-solving.
This paper conducts a comprehensive evaluation of LLMs in authorship analysis.
arXiv Detail & Related papers (2024-03-13T03:22:02Z) - From Understanding to Utilization: A Survey on Explainability for Large
Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs)
Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity.
When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Alignment for Honesty [113.42626737461129]
We argue for the importance of alignment for honesty, ensuring that language models proactively refuse to answer questions when they lack knowledge.
This challenge demands comprehensive solutions in terms of metric development, benchmark creation, and training.
We introduce a flexible training framework which is further instantiated by several efficient fine-tuning techniques that emphasize honesty.
arXiv Detail & Related papers (2023-12-12T06:10:42Z) - FFT: Towards Harmlessness Evaluation and Analysis for LLMs with
Factuality, Fairness, Toxicity [21.539026782010573]
The widespread of generative artificial intelligence has heightened concerns about the potential harms posed by AI-generated texts.
Previous researchers have invested much effort in assessing the harmlessness of generative language models.
arXiv Detail & Related papers (2023-11-30T14:18:47Z) - Are Large Language Models Reliable Judges? A Study on the Factuality
Evaluation Capabilities of LLMs [8.526956860672698]
Large Language Models (LLMs) have gained immense attention due to their notable emergent capabilities.
This study investigates the potential of LLMs as reliable assessors of factual consistency in summaries generated by text-generation models.
arXiv Detail & Related papers (2023-11-01T17:42:45Z) - Large Language Models Cannot Self-Correct Reasoning Yet [78.16697476530994]
Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities.
Concerns persist regarding the accuracy and appropriateness of their generated content.
A contemporary methodology, self-correction, has been proposed as a remedy to these issues.
arXiv Detail & Related papers (2023-10-03T04:56:12Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.