Automated Annotation with Generative AI Requires Validation
- URL: http://arxiv.org/abs/2306.00176v1
- Date: Wed, 31 May 2023 20:50:45 GMT
- Title: Automated Annotation with Generative AI Requires Validation
- Authors: Nicholas Pangakis, Samuel Wolken, and Neil Fasching
- Abstract summary: Generative large language models (LLMs) can be a powerful tool for augmenting text annotation procedures.
We outline a workflow to harness the annotation potential of LLMs in a principled, efficient way.
We find that LLM performance for text annotation is promising but highly contingent on both the dataset and the type of annotation task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative large language models (LLMs) can be a powerful tool for augmenting
text annotation procedures, but their performance varies across annotation
tasks due to prompt quality, text data idiosyncrasies, and conceptual
difficulty. Because these challenges will persist even as LLM technology
improves, we argue that any automated annotation process using an LLM must
validate the LLM's performance against labels generated by humans. To this end,
we outline a workflow to harness the annotation potential of LLMs in a
principled, efficient way. Using GPT-4, we validate this approach by
replicating 27 annotation tasks across 11 datasets from recent social science
articles in high-impact journals. We find that LLM performance for text
annotation is promising but highly contingent on both the dataset and the type
of annotation task, which reinforces the necessity to validate on a
task-by-task basis. We make available easy-to-use software designed to
implement our workflow and streamline the deployment of LLMs for automated
annotation.
Related papers
- Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement [7.108002571622824]
We propose Automatic Data Labeling and Refinement (ADLR) to automatically generate and filter demonstrations.
We demonstrate the advantage of ADLR in code-based table QA and mathematical reasoning, achieving up to a 5.5% gain.
arXiv Detail & Related papers (2024-10-14T10:06:58Z) - Keeping Humans in the Loop: Human-Centered Automated Annotation with Generative AI [0.0]
We use GPT-4 to replicate 27 annotation tasks across 11 password-protected datasets.
For each task, we compare GPT-4 annotations against human-annotated ground-truth labels and against annotations from separate supervised classification models fine-tuned on human-generated labels.
Our findings underscore the importance of a human-centered workflow and careful evaluation standards.
arXiv Detail & Related papers (2024-09-14T15:27:43Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization [12.885866125783618]
Large Language Models (LLMs) tend to produce inaccurate responses to specific queries.
We construct an adversarial dataset, named as $textbfADT (Adrial dataset for Tokenizer)$ to challenge LLMs' tokenization.
Our empirical results reveal that our ADT is highly effective on challenging the tokenization of leading LLMs, including GPT-4o, Llama-3, Qwen2.5-max and so on.
arXiv Detail & Related papers (2024-05-27T11:39:59Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - Large Language Models for Data Annotation: A Survey [49.8318827245266]
The emergence of advanced Large Language Models (LLMs) presents an unprecedented opportunity to automate the complicated process of data annotation.
This survey includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation.
arXiv Detail & Related papers (2024-02-21T00:44:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.