Related papers: Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text

Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text

URL: http://arxiv.org/abs/2303.17786v1
Date: Fri, 31 Mar 2023 03:17:23 GMT
Title: Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text
Authors: Yasmen Wahba, Nazim Madhavji, and John Steinbacher
Abstract summary: For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial. In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal. Despite the widespread use of PLMs, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification.
Score: 1.1508304497344637
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial to avoid errors propagating to the lower levels. In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal. A current trend in the Natural Language Processing (NLP) community is towards employing huge pre-trained language models (PLMs) or what is known as self-attention models (e.g., BERT) for almost any kind of NLP task (e.g., question-answering, sentiment analysis, text classification). Despite the widespread use of PLMs and the impressive performance in a broad range of NLP tasks, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification (TC) tasks, given the monosemic nature of specialized words (i.e., jargon) found in domain-specific text which renders the purpose of contextualized embeddings (e.g., PLMs) futile. In this paper, we compare the accuracies of some state-of-the-art (SOTA) models reported in the literature against a Linear SVM classifier and TFIDF vectorization model on three TC datasets. Results show a comparable performance for the LinearSVM. The findings of this study show that for domain-specific TC tasks, a linear model can provide a comparable, cheap, reproducible, and interpretable alternative to attention-based models.

Related papers

Tiny language models [0.0]
This study explores whether tiny language models (TLMs) exhibit the same key qualitative features of large language models (LLMs)<n>We demonstrate that TLMs exhibit a clear performance gap between pre-trained and non-pre-trained models across classification tasks.<n>The classification accuracy achieved by a pre-trained deep TLM architecture can be replicated through a soft committee of multiple, independently pre-trained shallow architectures.
arXiv Detail & Related papers (2025-07-20T08:49:57Z)
Large Language Models in the Task of Automatic Validation of Text Classifier Predictions [55.2480439325792]
Machine learning models for text classification are trained to predict a class for a given text.<n>To do this, training and validation samples must be prepared, and each text is assigned a class.<n>Human annotators are usually assigned by human annotators with different expertise levels, depending on the specific classification task.<n>This paper proposes several approaches to replace human annotators with Large Language Models.
arXiv Detail & Related papers (2025-05-24T13:19:03Z)
Large Language Models For Text Classification: Case Study And Comprehensive Review [0.3428444467046467]
We evaluate the performance of different Large Language Models (LLMs) in comparison with state-of-the-art deep-learning and machine-learning models. Our work reveals significant variations in model responses based on the prompting strategies.
arXiv Detail & Related papers (2025-01-14T22:02:38Z)
Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs) We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
A Small Claims Court for the NLP: Judging Legal Text Classification Strategies With Small Datasets [0.0]
This paper investigates the best strategies for optimizing the use of a small labeled dataset and large amounts of unlabeled data. We use the records of demands to a Brazilian Public Prosecutor's Office aiming to assign the descriptions in one of the subjects. The best result was obtained with Unsupervised Data Augmentation (UDA), which jointly uses BERT, data augmentation, and strategies of semi-supervised learning.
arXiv Detail & Related papers (2024-09-09T18:10:05Z)
Adaptable and Reliable Text Classification using Large Language Models [7.962669028039958]
This paper introduces an adaptable and reliable text classification paradigm, which leverages Large Language Models (LLMs) We evaluated the performance of several LLMs, machine learning algorithms, and neural network-based architectures on four diverse datasets. It is shown that the system's performance can be further enhanced through few-shot or fine-tuning strategies.
arXiv Detail & Related papers (2024-05-17T04:05:05Z)
Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings. An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts) This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z)
Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners. We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting. Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z)
Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting. The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z)
Rationale-Guided Few-Shot Classification to Detect Abusive Language [5.977278650516324]
We propose RGFS (Rationale-Guided Few-Shot Classification) for abusive language detection. We introduce two rationale-integrated BERT-based architectures (the RGFS models) and evaluate our systems over five different abusive language datasets.
arXiv Detail & Related papers (2022-11-30T14:47:14Z)
A Comparison of SVM against Pre-trained Language Models (PLMs) for Text Classification Tasks [1.2934180951771599]
For domain-specific corpora, fine-tuning a pre-trained model for a specific task has shown to provide a performance improvement. We compare the performance of four different PLMs on three public domain-free datasets and a real-world dataset containing domain-specific words.
arXiv Detail & Related papers (2022-11-04T16:28:40Z)
Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z)
Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box. Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types. We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.