Attention is Not Always What You Need: Towards Efficient Classification
of Domain-Specific Text
- URL: http://arxiv.org/abs/2303.17786v1
- Date: Fri, 31 Mar 2023 03:17:23 GMT
- Title: Attention is Not Always What You Need: Towards Efficient Classification
of Domain-Specific Text
- Authors: Yasmen Wahba, Nazim Madhavji, and John Steinbacher
- Abstract summary: For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial.
In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal.
Despite the widespread use of PLMs, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification.
- Score: 1.1508304497344637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For large-scale IT corpora with hundreds of classes organized in a hierarchy,
the task of accurate classification of classes at the higher level in the
hierarchies is crucial to avoid errors propagating to the lower levels. In the
business world, an efficient and explainable ML model is preferred over an
expensive black-box model, especially if the performance increase is marginal.
A current trend in the Natural Language Processing (NLP) community is towards
employing huge pre-trained language models (PLMs) or what is known as
self-attention models (e.g., BERT) for almost any kind of NLP task (e.g.,
question-answering, sentiment analysis, text classification). Despite the
widespread use of PLMs and the impressive performance in a broad range of NLP
tasks, there is a lack of a clear and well-justified need to as why these
models are being employed for domain-specific text classification (TC) tasks,
given the monosemic nature of specialized words (i.e., jargon) found in
domain-specific text which renders the purpose of contextualized embeddings
(e.g., PLMs) futile. In this paper, we compare the accuracies of some
state-of-the-art (SOTA) models reported in the literature against a Linear SVM
classifier and TFIDF vectorization model on three TC datasets. Results show a
comparable performance for the LinearSVM. The findings of this study show that
for domain-specific TC tasks, a linear model can provide a comparable, cheap,
reproducible, and interpretable alternative to attention-based models.
Related papers
- Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - A Small Claims Court for the NLP: Judging Legal Text Classification Strategies With Small Datasets [0.0]
This paper investigates the best strategies for optimizing the use of a small labeled dataset and large amounts of unlabeled data.
We use the records of demands to a Brazilian Public Prosecutor's Office aiming to assign the descriptions in one of the subjects.
The best result was obtained with Unsupervised Data Augmentation (UDA), which jointly uses BERT, data augmentation, and strategies of semi-supervised learning.
arXiv Detail & Related papers (2024-09-09T18:10:05Z) - Adaptable and Reliable Text Classification using Large Language Models [7.962669028039958]
This paper introduces an adaptable and reliable text classification paradigm, which leverages Large Language Models (LLMs)
We evaluated the performance of several LLMs, machine learning algorithms, and neural network-based architectures on four diverse datasets.
It is shown that the system's performance can be further enhanced through few-shot or fine-tuning strategies.
arXiv Detail & Related papers (2024-05-17T04:05:05Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z) - Rationale-Guided Few-Shot Classification to Detect Abusive Language [5.977278650516324]
We propose RGFS (Rationale-Guided Few-Shot Classification) for abusive language detection.
We introduce two rationale-integrated BERT-based architectures (the RGFS models) and evaluate our systems over five different abusive language datasets.
arXiv Detail & Related papers (2022-11-30T14:47:14Z) - A Comparison of SVM against Pre-trained Language Models (PLMs) for Text
Classification Tasks [1.2934180951771599]
For domain-specific corpora, fine-tuning a pre-trained model for a specific task has shown to provide a performance improvement.
We compare the performance of four different PLMs on three public domain-free datasets and a real-world dataset containing domain-specific words.
arXiv Detail & Related papers (2022-11-04T16:28:40Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.