Text Classification via Large Language Models
- URL: http://arxiv.org/abs/2305.08377v3
- Date: Mon, 9 Oct 2023 15:52:30 GMT
- Title: Text Classification via Large Language Models
- Authors: Xiaofei Sun, Xiaoya Li, Jiwei Li, Fei Wu, Shangwei Guo, Tianwei Zhang
and Guoyin Wang
- Abstract summary: We introduce Clue And Reasoning Prompting (CARP) to address complex linguistic phenomena involved in text classification.
Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks.
More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups.
- Score: 63.1874290788797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the remarkable success of large-scale Language Models (LLMs) such as
GPT-3, their performances still significantly underperform fine-tuned models in
the task of text classification. This is due to (1) the lack of reasoning
ability in addressing complex linguistic phenomena (e.g., intensification,
contrast, irony etc); (2) limited number of tokens allowed in in-context
learning.
In this paper, we introduce Clue And Reasoning Prompting (CARP). CARP adopts
a progressive reasoning strategy tailored to addressing the complex linguistic
phenomena involved in text classification: CARP first prompts LLMs to find
superficial clues (e.g., keywords, tones, semantic relations, references, etc),
based on which a diagnostic reasoning process is induced for final decisions.
To further address the limited-token issue, CARP uses a fine-tuned model on the
supervised dataset for $k$NN demonstration search in the in-context learning,
allowing the model to take the advantage of both LLM's generalization ability
and the task-specific evidence provided by the full labeled dataset.
Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used
text-classification benchmarks, 97.39 (+1.24) on SST-2, 96.40 (+0.72) on
AGNews, 98.78 (+0.25) on R8 and 96.95 (+0.6) on R52, and a performance
comparable to SOTA on MR (92.39 v.s. 93.3). More importantly, we find that CARP
delivers impressive abilities on low-resource and domain-adaptation setups.
Specifically, using 16 examples per class, CARP achieves comparable
performances to supervised models with 1,024 examples per class.
Related papers
- Reasoning with Reinforced Functional Token Tuning [70.96651128307985]
We propose Reinforced Functional Token Tuning (RFTT) to empower Large Language Models (LLMs) with self-play learn-to-reason capabilities.
RFTT embeds a rich set of learnable functional tokens directly into the model vocabulary, enabling chain-of-thought construction with diverse human-like reasoning behaviors.
arXiv Detail & Related papers (2025-02-19T02:59:42Z) - Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction [2.2999148299770047]
We explore the capabilities of large language models for zero- and few-shot learning on the ASQP task.
We report F1 scores slightly below those obtained with state-of-the-art fine-tuned models but exceeding previously reported zero- and few-shot performance.
arXiv Detail & Related papers (2025-02-18T16:56:15Z) - Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability [53.51560766150442]
Critical tokens are elements within reasoning trajectories that significantly influence incorrect outcomes.
We present a novel framework for identifying these tokens through rollout sampling.
We show that identifying and replacing critical tokens significantly improves model accuracy.
arXiv Detail & Related papers (2024-11-29T18:58:22Z) - ILLUMINER: Instruction-tuned Large Language Models as Few-shot Intent Classifier and Slot Filler [1.9015367254988451]
This study evaluates instruction-tuned models (Instruct-LLMs) on popular benchmark datasets for intent classification (IC) and slot filling (SF)
We introduce ILLUMINER, an approach framing IC and SF as language generation tasks for Instruct-LLMs, with a more efficient SF-prompting method compared to prior work.
A comprehensive comparison with multiple baselines shows that our approach, using the FLAN-T5 11B model, outperforms the state-of-the-art joint IC+SF method and in-context learning with GPT3.5 (175B).
arXiv Detail & Related papers (2024-03-26T09:41:21Z) - Exploring Small Language Models with Prompt-Learning Paradigm for
Efficient Domain-Specific Text Classification [2.410463233396231]
Small language models (SLMs) offer significant customizability, adaptability, and cost-effectiveness for domain-specific tasks.
In few-shot settings when prompt-based model fine-tuning is possible, T5-base, a typical SLM with 220M parameters, achieve approximately 75% accuracy with limited labeled data.
In zero-shot settings with a fixed model, we underscore a pivotal observation that, although the GPT-3.5-turbo equipped with around 154B parameters garners an accuracy of 55.16%, the power of well designed prompts becomes evident.
arXiv Detail & Related papers (2023-09-26T09:24:46Z) - Better Zero-Shot Reasoning with Role-Play Prompting [10.90357246745529]
Role-play prompting consistently surpasses the standard zero-shot approach across most datasets.
This highlights its potential to augment the reasoning capabilities of large language models.
arXiv Detail & Related papers (2023-08-15T11:08:30Z) - Pushing the Limits of ChatGPT on NLP Tasks [79.17291002710517]
Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines.
In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors.
We propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks.
arXiv Detail & Related papers (2023-06-16T09:40:05Z) - Attention is Not Always What You Need: Towards Efficient Classification
of Domain-Specific Text [1.1508304497344637]
For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial.
In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal.
Despite the widespread use of PLMs, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification.
arXiv Detail & Related papers (2023-03-31T03:17:23Z) - Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z) - Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages.
Our largest model sets new state of the art in few-shot learning in more than 20 representative languages.
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.