Human Still Wins over LLM: An Empirical Study of Active Learning on
Domain-Specific Annotation Tasks
- URL: http://arxiv.org/abs/2311.09825v1
- Date: Thu, 16 Nov 2023 11:51:13 GMT
- Title: Human Still Wins over LLM: An Empirical Study of Active Learning on
Domain-Specific Annotation Tasks
- Authors: Yuxuan Lu, Bingsheng Yao, Shao Zhang, Yun Wang, Peng Zhang, Tun Lu,
Toby Jia-Jun Li, Dakuo Wang
- Abstract summary: Small models can outperform GPT-3.5 with a few hundreds of labeled data, and they achieve higher or similar performance with GPT-4 despite that they are hundreds time smaller.
Based on these findings, we posit that LLM predictions can be used as a warmup method in real-world applications.
- Score: 37.56584999012332
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated considerable advances, and
several claims have been made about their exceeding human performance. However,
in real-world tasks, domain knowledge is often required. Low-resource learning
methods like Active Learning (AL) have been proposed to tackle the cost of
domain expert annotation, raising this question: Can LLMs surpass compact
models trained with expert annotations in domain-specific tasks? In this work,
we conduct an empirical experiment on four datasets from three different
domains comparing SOTA LLMs with small models trained on expert annotations
with AL. We found that small models can outperform GPT-3.5 with a few hundreds
of labeled data, and they achieve higher or similar performance with GPT-4
despite that they are hundreds time smaller. Based on these findings, we posit
that LLM predictions can be used as a warmup method in real-world applications
and human experts remain indispensable in tasks involving data annotation
driven by domain-specific knowledge.
Related papers
- Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data [3.9459077974367833]
Large language models (LLMs) have demonstrated remarkable success in NLP tasks.
We benchmarked one supervised classic machine learning model based on Support Vector Machines (SVMs), three supervised pretrained language models (PLMs) based on RoBERTa, BERTweet, and SocBERT, and two LLM based classifiers (GPT3.5 and GPT4), across 6 text classification tasks.
Our comprehensive experiments demonstrate that employ-ing data augmentation using LLMs (GPT-4) with relatively small human-annotated data to train lightweight supervised classification models achieves superior results compared to training with human-annotated data
arXiv Detail & Related papers (2024-03-27T22:05:10Z) - BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs [49.32067576992511]
Large language models often fall short of the performance achieved by domain-specific state-of-the-art models.
One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets.
We propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA)
Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks.
arXiv Detail & Related papers (2024-02-20T09:02:55Z) - EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models
with Semi-structured Data [67.8302955948861]
Large Language Models (LLMs) pre-trained on massive corpora have exhibited remarkable performance on various NLP tasks.
Applying these models to specific domains still poses significant challenges, such as lack of domain knowledge.
We focus on domain-specific continual pre-training of LLMs using E-commerce domain as an exemplar.
arXiv Detail & Related papers (2023-12-25T11:31:47Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Large Language Models as Data Preprocessors [9.99065004972981]
Large Language Models (LLMs) have marked a significant advancement in artificial intelligence.
This study explores their potential in data preprocessing, a critical stage in data mining and analytics applications.
We propose an LLM-based framework for data preprocessing, which integrates cutting-edge prompt engineering techniques.
arXiv Detail & Related papers (2023-08-30T23:28:43Z) - Fairness of ChatGPT and the Role Of Explainable-Guided Prompts [6.079011829257036]
Our research investigates the potential of Large-scale Language Models (LLMs), specifically OpenAI's GPT, in credit risk assessment.
Our findings suggest that LLMs, when directed by judiciously designed prompts and supplemented with domain-specific knowledge, can parallel the performance of traditional Machine Learning (ML) models.
arXiv Detail & Related papers (2023-07-14T09:20:16Z) - Knowledge-Augmented Reasoning Distillation for Small Language Models in
Knowledge-Intensive Tasks [90.11273439036455]
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks.
We propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales from LLMs with augmented knowledge retrieved from an external knowledge base.
We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets.
arXiv Detail & Related papers (2023-05-28T13:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.