Improving Classification Performance With Human Feedback: Label a few,
we label the rest
- URL: http://arxiv.org/abs/2401.09555v1
- Date: Wed, 17 Jan 2024 19:13:05 GMT
- Title: Improving Classification Performance With Human Feedback: Label a few,
we label the rest
- Authors: Natan Vidra, Thomas Clifford, Katherine Jijo, Eden Chung, Liang Zhang
- Abstract summary: This paper focuses on understanding how a continuous feedback loop can refine models, thereby enhancing their accuracy, recall, and precision.
We benchmark this approach on the Financial Phrasebank, Banking, Craigslist, Trec, Amazon Reviews datasets to prove that with just a few labeled examples, we are able to surpass the accuracy of zero shot large language models.
- Score: 2.7386128680964408
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of artificial intelligence, where a vast majority of data is
unstructured, obtaining substantial amounts of labeled data to train supervised
machine learning models poses a significant challenge. To address this, we
delve into few-shot and active learning, where are goal is to improve AI models
with human feedback on a few labeled examples. This paper focuses on
understanding how a continuous feedback loop can refine models, thereby
enhancing their accuracy, recall, and precision through incremental human
input. By employing Large Language Models (LLMs) such as GPT-3.5, BERT, and
SetFit, we aim to analyze the efficacy of using a limited number of labeled
examples to substantially improve model accuracy. We benchmark this approach on
the Financial Phrasebank, Banking, Craigslist, Trec, Amazon Reviews datasets to
prove that with just a few labeled examples, we are able to surpass the
accuracy of zero shot large language models to provide enhanced text
classification performance. We demonstrate that rather than needing to manually
label millions of rows of data, we just need to label a few and the model can
effectively predict the rest.
Related papers
- Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Language Models in the Loop: Incorporating Prompting into Weak
Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited.
Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z) - Fortunately, Discourse Markers Can Enhance Language Models for Sentiment
Analysis [13.149482582098429]
We propose to leverage sentiment-carrying discourse markers to generate large-scale weakly-labeled data.
We show the value of our approach on various benchmark datasets, including the finance domain.
arXiv Detail & Related papers (2022-01-06T12:33:47Z) - Few-shot Instruction Prompts for Pretrained Language Models to Detect
Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs)
We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z) - Complementary Ensemble Learning [1.90365714903665]
We derive a technique to improve performance of state-of-the-art deep learning models.
Specifically, we train auxiliary models which are able to complement state-of-the-art model uncertainty.
arXiv Detail & Related papers (2021-11-09T03:23:05Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - A Systematic Evaluation of Transfer Learning and Pseudo-labeling with
BERT-based Ranking Models [2.0498977512661267]
We evaluate transferability of BERT-based neural ranking models across five English datasets.
Each of our collections has a substantial number of queries, which enables a full-shot evaluation mode.
We find that training on pseudo-labels can produce a competitive or better model compared to transfer learning.
arXiv Detail & Related papers (2021-03-04T21:08:06Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.