Related papers: Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation

Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation

URL: http://arxiv.org/abs/2506.03857v1
Date: Wed, 04 Jun 2025 11:42:37 GMT
Title: Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation
Authors: Mingxuan Xia, Haobo Wang, Yixuan Li, Zewei Yu, Jindong Wang, Junbo Zhao, Runze Wu,
Abstract summary: We propose a novel candidate annotation paradigm wherein large language models are encouraged to output all possible labels when incurring uncertainty.<n>To ensure unique labels are provided for downstream tasks, we develop a teacher-student framework CanDist that distills candidate annotations with a Small Language Model.
Score: 35.1208076670736
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Large Language Models (LLMs) have demonstrated significant potential for data annotation, markedly reducing the labor costs associated with downstream applications. However, existing methods mostly adopt an aggressive strategy by prompting LLM to determine a single gold label for each unlabeled sample. Due to the inherent uncertainty within LLMs, they often produce incorrect labels for difficult samples, severely compromising the data quality for downstream applications. Motivated by ambiguity aversion in human behaviors, we propose a novel candidate annotation paradigm wherein large language models are encouraged to output all possible labels when incurring uncertainty. To ensure unique labels are provided for downstream tasks, we develop a teacher-student framework CanDist that distills candidate annotations with a Small Language Model (SLM). We further provide a rigorous justification demonstrating that distilling candidate annotations from the teacher LLM offers superior theoretical guarantees compared to directly using single annotations. Extensive experiments across six text classification tasks validate the effectiveness of our proposed method. The source code is available at https://github.com/MingxuanXia/CanDist.

Related papers

Are Multimodal Large Language Models Good Annotators for Image Tagging? [62.01475514488922]
This paper aims to analyze the gap between MLLM-generated and human annotations.<n>We propose TagLLM, a novel framework for image tagging, which aims to narrow the gap between MLLM-generated and human annotations.
arXiv Detail & Related papers (2026-02-24T14:53:16Z)
Enhancing LLM-Based Data Annotation with Error Decomposition [6.6544828402388445]
Large language models offer a scalable alternative to human coding for data annotation tasks.<n>Their performance on subjective annotation tasks is less consistent and more prone to errors.<n>We propose a diagnostic evaluation paradigm that incorporates a human-in-the-loop step to separate task-inherent ambiguity from model-driven inaccuracies.
arXiv Detail & Related papers (2026-01-17T05:43:17Z)
Refining Sentence Embedding Model through Ranking Sentences Generation with Large Language Models [60.00178316095646]
Sentence embedding is essential for many NLP tasks, with contrastive learning methods achieving strong performance using datasets like NLI.<n>Recent studies leverage large language models (LLMs) to generate sentence pairs, reducing annotation dependency.<n>We propose a method for controlling the generation direction of LLMs in the latent space. Unlike unconstrained generation, the controlled approach ensures meaningful semantic divergence.<n> Experiments on multiple benchmarks demonstrate that our method achieves new SOTA performance with a modest cost in ranking sentence synthesis.
arXiv Detail & Related papers (2025-02-19T12:07:53Z)
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance [21.926934384262594]
Large language models (LLMs) offer new opportunities to enhance the annotation process. We compare expert, crowd-sourced, and our LLM-based annotations in terms of agreement, label quality, and efficiency. Our findings reveal a substantial number of label errors, which, when corrected, induce a significant upward shift in reported model performance.
arXiv Detail & Related papers (2024-10-24T16:27:03Z)
Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language [0.0]
This study introduces a prescriptive annotation benchmark grounded in humanities research to ensure consistent, unbiased labeling of offensive language. We contribute two newly annotated datasets that achieve higher inter-annotator agreement between human and language model (LLM) annotations.
arXiv Detail & Related papers (2024-10-17T08:10:24Z)
Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels [75.77877889764073]
Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels. This study explores whether solely utilizing unlabeled data can elicit strong model capabilities. We propose a new paradigm termed zero-to-strong generalization.
arXiv Detail & Related papers (2024-09-19T02:59:44Z)
Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors.<n>We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z)
Leveraging Weakly Annotated Data for Hate Speech Detection in Code-Mixed Hinglish: A Feasibility-Driven Transfer Learning Approach with Large Language Models [0.0]
Hate speech detection in mix-code low-resource languages is an active problem area where the use of Large Language Models has proven beneficial. In this study, we have compiled a dataset of 100 YouTube comments, and weakly labelled them for coarse and fine-grained misogyny classification in mix-code Hinglish. Out of all the approaches, zero-shot classification using the Bidirectional Auto-Regressive Transformers (BART) large model and few-shot prompting using Generative Pre-trained Transformer- 3 (ChatGPT-3) achieve the best results.
arXiv Detail & Related papers (2024-03-04T15:27:49Z)
LLMaAA: Making Large Language Models as Active Annotators [32.57011151031332]
We propose LLMaAA, which takes large language models as annotators and puts them into an active learning loop to determine what to annotate efficiently. We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction. With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples.
arXiv Detail & Related papers (2023-10-30T14:54:15Z)
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation [94.59630161324013]
We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
arXiv Detail & Related papers (2023-10-24T08:56:49Z)
IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models [66.32043210237768]
This paper introduces an influence-driven selective annotation method. It aims to minimize annotation costs while improving the quality of in-context examples. Experiments confirm the superiority of the proposed method on various benchmarks.
arXiv Detail & Related papers (2023-10-16T22:53:54Z)
Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.