Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification
- URL: http://arxiv.org/abs/2303.07142v3
- Date: Tue, 18 Apr 2023 09:58:53 GMT
- Title: Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification
- Authors: Benjamin Clavi\'e and Alexandru Ciceu and Frederick Naylor and
Guillaume Souli\'e and Thomas Brightwell
- Abstract summary: This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
- Score: 58.720142291102135
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This case study investigates the task of job classification in a real-world
setting, where the goal is to determine whether an English-language job posting
is appropriate for a graduate or entry-level position. We explore multiple
approaches to text classification, including supervised approaches such as
traditional models like Support Vector Machines (SVMs) and state-of-the-art
deep learning methods such as DeBERTa. We compare them with Large Language
Models (LLMs) used in both few-shot and zero-shot classification settings. To
accomplish this task, we employ prompt engineering, a technique that involves
designing prompts to guide the LLMs towards the desired output. Specifically,
we evaluate the performance of two commercially available state-of-the-art
GPT-3.5-based language models, text-davinci-003 and gpt-3.5-turbo. We also
conduct a detailed analysis of the impact of different aspects of prompt
engineering on the model's performance. Our results show that, with a
well-designed prompt, a zero-shot gpt-3.5-turbo classifier outperforms all
other models, achieving a 6% increase in Precision@95% Recall compared to the
best supervised approach. Furthermore, we observe that the wording of the
prompt is a critical factor in eliciting the appropriate "reasoning" in the
model, and that seemingly minor aspects of the prompt significantly affect the
model's performance.
Related papers
- Unraveling the Capabilities of Language Models in News Summarization [0.0]
This work provides a comprehensive benchmarking of 20 recent language models, focusing on smaller ones for the news summarization task.
We focus in this study on zero-shot and few-shot learning settings and we apply a robust evaluation methodology.
We highlight the exceptional performance of GPT-3.5-Turbo and GPT-4, which generally dominate due to their advanced capabilities.
arXiv Detail & Related papers (2025-01-30T04:20:16Z) - Large Language Models For Text Classification: Case Study And Comprehensive Review [0.3428444467046467]
We evaluate the performance of different Large Language Models (LLMs) in comparison with state-of-the-art deep-learning and machine-learning models.
Our work reveals significant variations in model responses based on the prompting strategies.
arXiv Detail & Related papers (2025-01-14T22:02:38Z) - A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources.
We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z) - Which AI Technique Is Better to Classify Requirements? An Experiment with SVM, LSTM, and ChatGPT [0.4588028371034408]
This paper reports an empirical evaluation of two ChatGPT models for requirements classification.
Our results show that there is no single best technique for all types of requirement classes.
The few-shot setting has been found to be beneficial primarily in scenarios where zero-shot results are significantly low.
arXiv Detail & Related papers (2023-11-20T05:55:05Z) - Large language models for aspect-based sentiment analysis [0.0]
We assess the performance of GPT-4 and GPT-3.5 in zero shot, few shot and fine-tuned settings.
Fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task.
arXiv Detail & Related papers (2023-10-27T10:03:21Z) - Exploring Small Language Models with Prompt-Learning Paradigm for
Efficient Domain-Specific Text Classification [2.410463233396231]
Small language models (SLMs) offer significant customizability, adaptability, and cost-effectiveness for domain-specific tasks.
In few-shot settings when prompt-based model fine-tuning is possible, T5-base, a typical SLM with 220M parameters, achieve approximately 75% accuracy with limited labeled data.
In zero-shot settings with a fixed model, we underscore a pivotal observation that, although the GPT-3.5-turbo equipped with around 154B parameters garners an accuracy of 55.16%, the power of well designed prompts becomes evident.
arXiv Detail & Related papers (2023-09-26T09:24:46Z) - MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - Scaling Instruction-Finetuned Language Models [126.4789306516927]
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance.
We find that instruction finetuning dramatically improves performance on a variety of model classes.
arXiv Detail & Related papers (2022-10-20T16:58:32Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - Reframing Instructional Prompts to GPTk's Language [72.69833640335519]
We propose reframing techniques for model designers to create effective prompts for language models.
Our results show that reframing improves few-shot learning performance by 14% while reducing sample complexity.
The performance gains are particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is not feasible.
arXiv Detail & Related papers (2021-09-16T09:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.