Related papers: Which AI Technique Is Better to Classify Requirements? An Experiment with SVM, LSTM, and ChatGPT

Which AI Technique Is Better to Classify Requirements? An Experiment with SVM, LSTM, and ChatGPT

URL: http://arxiv.org/abs/2311.11547v2
Date: Tue, 16 Apr 2024 09:06:25 GMT
Title: Which AI Technique Is Better to Classify Requirements? An Experiment with SVM, LSTM, and ChatGPT
Authors: Abdelkarim El-Hajjami, Nicolas Fafin, Camille Salinesi,
Abstract summary: This paper reports an empirical evaluation of two ChatGPT models for requirements classification. Our results show that there is no single best technique for all types of requirement classes. The few-shot setting has been found to be beneficial primarily in scenarios where zero-shot results are significantly low.
Score: 0.4588028371034408
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, Large Language Models like ChatGPT have demonstrated remarkable proficiency in various Natural Language Processing tasks. Their application in Requirements Engineering, especially in requirements classification, has gained increasing interest. This paper reports an extensive empirical evaluation of two ChatGPT models, specifically gpt-3.5-turbo, and gpt-4 in both zero-shot and few-shot settings for requirements classification. The question arises as to how these models compare to traditional classification methods, specifically Support Vector Machine and Long Short-Term Memory. Based on five different datasets, our results show that there is no single best technique for all types of requirement classes. Interestingly, the few-shot setting has been found to be beneficial primarily in scenarios where zero-shot results are significantly low.

Related papers

Language Models to Support Multi-Label Classification of Industrial Data [4.759965976769317]
We focuse on classifying requirements according to a taxonomy designed to support requirements tracing. Our ground truth includes 377 requirements and 1968 labels from 6 output spaces. We conclude that using ZSL for multi-label requirements classification offers promising results.
arXiv Detail & Related papers (2025-04-22T14:06:02Z)
The Art of Saying No: Contextual Noncompliance in Language Models [123.383993700586]
We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests. To test noncompliance capabilities of language models, we use this taxonomy to develop a new evaluation suite of 1000 noncompliance prompts.
arXiv Detail & Related papers (2024-07-02T07:12:51Z)
Generative Multi-modal Models are Good Class-Incremental Learners [51.5648732517187]
We propose a novel generative multi-modal model (GMM) framework for class-incremental learning. Our approach directly generates labels for images using an adapted generative model. Under the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting.
arXiv Detail & Related papers (2024-03-27T09:21:07Z)
Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings. An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts) This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z)
Efficient Classification of Student Help Requests in Programming Courses Using Large Language Models [2.5949084781328744]
This study evaluates the performance of the GPT-3.5 and GPT-4 models for classifying help requests from students in an introductory programming class. Fine-tuning the GPT-3.5 model improved its performance to such an extent that it approximated the accuracy and consistency across categories observed between two human raters.
arXiv Detail & Related papers (2023-10-31T00:56:33Z)
Large language models for aspect-based sentiment analysis [0.0]
We assess the performance of GPT-4 and GPT-3.5 in zero shot, few shot and fine-tuned settings. Fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task.
arXiv Detail & Related papers (2023-10-27T10:03:21Z)
Investigating the Limitation of CLIP Models: The Worst-Performing Categories [53.360239882501325]
Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts. It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts. However, we found that their performance in the worst categories is significantly inferior to the overall performance.
arXiv Detail & Related papers (2023-10-05T05:37:33Z)
Empirical Evaluation of ChatGPT on Requirements Information Retrieval Under Zero-Shot Setting [12.733403458944972]
We empirically evaluate ChatGPT's performance on requirements information retrieval tasks. Under zero-shot setting, evaluation results reveal ChatGPT's promising ability to retrieve requirements relevant information.
arXiv Detail & Related papers (2023-04-25T04:09:45Z)
Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text [1.1508304497344637]
For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial. In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal. Despite the widespread use of PLMs, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification.
arXiv Detail & Related papers (2023-03-31T03:17:23Z)
Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting. The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z)
Fine-grained Angular Contrastive Learning with Coarse Labels [72.80126601230447]
We introduce a novel 'Angular normalization' module that allows to effectively combine supervised and self-supervised contrastive pre-training. This work will help to pave the way for future research on this new, challenging, and very practical topic of C2FS classification.
arXiv Detail & Related papers (2020-12-07T08:09:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.