Related papers: Automated Annotation with Generative AI Requires Validation

Automated Annotation with Generative AI Requires Validation

URL: http://arxiv.org/abs/2306.00176v1
Date: Wed, 31 May 2023 20:50:45 GMT
Title: Automated Annotation with Generative AI Requires Validation
Authors: Nicholas Pangakis, Samuel Wolken, and Neil Fasching
Abstract summary: Generative large language models (LLMs) can be a powerful tool for augmenting text annotation procedures. We outline a workflow to harness the annotation potential of LLMs in a principled, efficient way. We find that LLM performance for text annotation is promising but highly contingent on both the dataset and the type of annotation task.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative large language models (LLMs) can be a powerful tool for augmenting text annotation procedures, but their performance varies across annotation tasks due to prompt quality, text data idiosyncrasies, and conceptual difficulty. Because these challenges will persist even as LLM technology improves, we argue that any automated annotation process using an LLM must validate the LLM's performance against labels generated by humans. To this end, we outline a workflow to harness the annotation potential of LLMs in a principled, efficient way. Using GPT-4, we validate this approach by replicating 27 annotation tasks across 11 datasets from recent social science articles in high-impact journals. We find that LLM performance for text annotation is promising but highly contingent on both the dataset and the type of annotation task, which reinforces the necessity to validate on a task-by-task basis. We make available easy-to-use software designed to implement our workflow and streamline the deployment of LLMs for automated annotation.

Related papers

Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG [69.51637252264277]
We investigate whether Large Language Models (LLMs) can effectively replace human annotations in training retrieval models. Our experiments show that retrievers trained on utility-focused annotations significantly outperform those trained on human annotations in the out-of-domain setting. Just 20% human-annotated data enables retrievers trained with utility-focused annotations to match the performance of models trained entirely with human annotations.
arXiv Detail & Related papers (2025-04-07T16:05:52Z)
Can Frontier LLMs Replace Annotators in Biomedical Text Mining? Analyzing Challenges and Exploring Solutions [0.0]
Large language models (LLMs) can perform various natural language processing (NLP) tasks through in-context learning without relying on supervised data. We identify three primary challenges for LLMs in biomedical corpora. Our findings show that frontier LLMs can approach or surpass the performance of state-of-the-art (SOTA) BERT-based models.
arXiv Detail & Related papers (2025-03-05T08:37:10Z)
From Human Annotation to LLMs: SILICON Annotation Workflow for Management Research [13.818244562506138]
Large Language Models (LLMs) provide a cost-effective and efficient alternative to human annotation. This paper introduces the SILICON" (Systematic Inference with LLMs for Information Classification and Notation) workflow. The workflow integrates established principles of human annotation with systematic prompt optimization and model selection.
arXiv Detail & Related papers (2024-12-19T02:21:41Z)
Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement [7.108002571622824]
We propose Automatic Data Labeling and Refinement (ADLR) to automatically generate and filter demonstrations. We demonstrate the advantage of ADLR in code-based table QA and mathematical reasoning, achieving up to a 5.5% gain.
arXiv Detail & Related papers (2024-10-14T10:06:58Z)
Keeping Humans in the Loop: Human-Centered Automated Annotation with Generative AI [0.0]
We use GPT-4 to replicate 27 annotation tasks across 11 password-protected datasets. For each task, we compare GPT-4 annotations against human-annotated ground-truth labels and against annotations from separate supervised classification models fine-tuned on human-generated labels. Our findings underscore the importance of a human-centered workflow and careful evaluation standards.
arXiv Detail & Related papers (2024-09-14T15:27:43Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations. Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z)
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z)
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization [12.885866125783618]
Large Language Models (LLMs) tend to produce inaccurate responses to specific queries. We construct an adversarial dataset, named as $textbfADT (Adrial dataset for Tokenizer)$ to challenge LLMs' tokenization. Our empirical results reveal that our ADT is highly effective on challenging the tokenization of leading LLMs, including GPT-4o, Llama-3, Qwen2.5-max and so on.
arXiv Detail & Related papers (2024-05-27T11:39:59Z)
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models. It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop. Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z)
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG. InFO-RAG is low-cost and general across various tasks. It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z)
Large Language Models for Data Annotation: A Survey [49.8318827245266]
The emergence of advanced Large Language Models (LLMs) presents an unprecedented opportunity to automate the complicated process of data annotation. This survey includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation.
arXiv Detail & Related papers (2024-02-21T00:44:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.