Language Models to Support Multi-Label Classification of Industrial Data
- URL: http://arxiv.org/abs/2504.15922v2
- Date: Wed, 23 Apr 2025 07:24:40 GMT
- Title: Language Models to Support Multi-Label Classification of Industrial Data
- Authors: Waleed Abdeen, Michael Unterkalmsteiner, Krzysztof Wnuk, Alessio Ferrari, Panagiota Chatzipetrou,
- Abstract summary: We focuse on classifying requirements according to a taxonomy designed to support requirements tracing.<n>Our ground truth includes 377 requirements and 1968 labels from 6 output spaces.<n>We conclude that using ZSL for multi-label requirements classification offers promising results.
- Score: 4.759965976769317
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-label requirements classification is a challenging task, especially when dealing with numerous classes at varying levels of abstraction. The difficulties increases when a limited number of requirements is available to train a supervised classifier. Zero-shot learning (ZSL) does not require training data and can potentially address this problem. This paper investigates the performance of zero-shot classifiers (ZSCs) on a multi-label industrial dataset. We focuse on classifying requirements according to a taxonomy designed to support requirements tracing. We compare multiple variants of ZSCs using different embeddings, including 9 language models (LMs) with a reduced number of parameters (up to 3B), e.g., BERT, and 5 large LMs (LLMs) with a large number of parameters (up to 70B), e.g., Llama. Our ground truth includes 377 requirements and 1968 labels from 6 output spaces. For the evaluation, we adopt traditional metrics, i.e., precision, recall, F1, and $F_\beta$, as well as a novel label distance metric Dn. This aims to better capture the classification's hierarchical nature and provides a more nuanced evaluation of how far the results are from the ground truth. 1) The top-performing model on 5 out of 6 output spaces is T5-xl, with maximum $F_\beta$ = 0.78 and Dn = 0.04, while BERT base outperformed the other models in one case, with maximum $F_\beta$ = 0.83 and Dn = 0.04. 2) LMs with smaller parameter size produce the best classification results compared to LLMs. Thus, addressing the problem in practice is feasible as limited computing power is needed. 3) The model architecture (autoencoding, autoregression, and sentence-to-sentence) significantly affects the classifier's performance. We conclude that using ZSL for multi-label requirements classification offers promising results. We also present a novel metric that can be used to select the top-performing model for this problem
Related papers
- ANLS* -- A Universal Document Processing Metric for Generative Large Language Models [40.94659575657584]
This paper introduces a new metric for evaluating generative models called ANLS*.<n>The ANLS* metric extends existing ANLS metrics as a drop-in-replacement and is still compatible with previously reported ANLS scores.<n>We also benchmark a novel approach to generate prompts for documents, called SFT, against other prompting techniques such as LATIN.
arXiv Detail & Related papers (2024-02-06T09:50:08Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - Which AI Technique Is Better to Classify Requirements? An Experiment with SVM, LSTM, and ChatGPT [0.4588028371034408]
This paper reports an empirical evaluation of two ChatGPT models for requirements classification.
Our results show that there is no single best technique for all types of requirement classes.
The few-shot setting has been found to be beneficial primarily in scenarios where zero-shot results are significantly low.
arXiv Detail & Related papers (2023-11-20T05:55:05Z) - Test-Time Self-Adaptive Small Language Models for Question Answering [63.91013329169796]
We show and investigate the capabilities of smaller self-adaptive LMs, only with unlabeled test data.
Our proposed self-adaption strategy demonstrates significant performance improvements on benchmark QA datasets.
arXiv Detail & Related papers (2023-10-20T06:49:32Z) - Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation [2.024620791810963]
This study benchmarks the performance of Prompt Tuning and baselines for multi-label text classification.
It is applied to classifying companies into an investment firm's proprietary industry taxonomy.
We confirm that the model's performance is consistent across both well-known and less-known companies.
arXiv Detail & Related papers (2023-09-21T13:45:32Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes [91.58845026796149]
We introduce Distilling step-by-step, a new mechanism that trains small models that outperform large language models.
We present three findings across 4 NLP benchmarks.
arXiv Detail & Related papers (2023-05-03T17:50:56Z) - Attention is Not Always What You Need: Towards Efficient Classification
of Domain-Specific Text [1.1508304497344637]
For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial.
In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal.
Despite the widespread use of PLMs, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification.
arXiv Detail & Related papers (2023-03-31T03:17:23Z) - On Non-Random Missing Labels in Semi-Supervised Learning [114.62655062520425]
Semi-Supervised Learning (SSL) is fundamentally a missing label problem.
We explicitly incorporate "class" into SSL.
Our method not only significantly outperforms existing baselines but also surpasses other label bias removal SSL methods.
arXiv Detail & Related papers (2022-06-29T22:01:29Z) - ZeroBERTo -- Leveraging Zero-Shot Text Classification by Topic Modeling [57.80052276304937]
This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task.
We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset.
arXiv Detail & Related papers (2022-01-04T20:08:17Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - FreaAI: Automated extraction of data slices to test machine learning
models [2.475112368179548]
We show the feasibility of automatically extracting feature models that result in explainable data slices over which the ML solution under-performs.
Our novel technique, IBM FreaAI aka FreaAI, extracts such slices from structured ML test data or any other labeled data.
arXiv Detail & Related papers (2021-08-12T09:21:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.