Large language models for aspect-based sentiment analysis
- URL: http://arxiv.org/abs/2310.18025v1
- Date: Fri, 27 Oct 2023 10:03:21 GMT
- Title: Large language models for aspect-based sentiment analysis
- Authors: Paul F. Simmering, Paavo Huoviala
- Abstract summary: We assess the performance of GPT-4 and GPT-3.5 in zero shot, few shot and fine-tuned settings.
Fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) offer unprecedented text completion
capabilities. As general models, they can fulfill a wide range of roles,
including those of more specialized models. We assess the performance of GPT-4
and GPT-3.5 in zero shot, few shot and fine-tuned settings on the aspect-based
sentiment analysis (ABSA) task. Fine-tuned GPT-3.5 achieves a state-of-the-art
F1 score of 83.8 on the joint aspect term extraction and polarity
classification task of the SemEval-2014 Task 4, improving upon InstructABSA
[@scaria_instructabsa_2023] by 5.7%. However, this comes at the price of 1000
times more model parameters and thus increased inference cost. We discuss the
the cost-performance trade-offs of different models, and analyze the typical
errors that they make. Our results also indicate that detailed prompts improve
performance in zero-shot and few-shot settings but are not necessary for
fine-tuned models. This evidence is relevant for practioners that are faced
with the choice of prompt engineering versus fine-tuning when using LLMs for
ABSA.
Related papers
- Evaluating the Performance of Large Language Models for SDG Mapping (Technical Report) [6.789534723913505]
Large language models (LLMs) enable users to protect data privacy by eliminating the need to provide data to third parties.
We compare the performance of various language models on the Sustainable Development Goal mapping task.
According to the results of this study, LLaMA 2 and Gemma still have significant room for improvement.
arXiv Detail & Related papers (2024-08-05T03:05:02Z) - Exploring Small Language Models with Prompt-Learning Paradigm for
Efficient Domain-Specific Text Classification [2.410463233396231]
Small language models (SLMs) offer significant customizability, adaptability, and cost-effectiveness for domain-specific tasks.
In few-shot settings when prompt-based model fine-tuning is possible, T5-base, a typical SLM with 220M parameters, achieve approximately 75% accuracy with limited labeled data.
In zero-shot settings with a fixed model, we underscore a pivotal observation that, although the GPT-3.5-turbo equipped with around 154B parameters garners an accuracy of 55.16%, the power of well designed prompts becomes evident.
arXiv Detail & Related papers (2023-09-26T09:24:46Z) - InheritSumm: A General, Versatile and Compact Summarizer by Distilling
from GPT [75.29359361404073]
InheritSumm is a versatile and compact summarization model derived from GPT-3.5 through distillation.
It achieves similar or superior performance to GPT-3.5 in zeroshot and fewshot settings.
arXiv Detail & Related papers (2023-05-22T14:52:32Z) - AMR Parsing with Instruction Fine-tuned Pre-trained Language Models [21.767812442354387]
In this paper, we take one of such instruction fine-tuned language models, i.e. FLAN-T5, and fine-tune them for AMR parsing.
Our experiments on various AMR parsing tasks including AMR2.0, AMR3.0 and BioAMR indicate that FLAN-T5 fine-tuned models out-perform previous state-of-the-art models.
arXiv Detail & Related papers (2023-04-24T17:12:17Z) - A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models [71.42197262495056]
GPT series models have gained considerable attention due to their exceptional natural language processing capabilities.
We select six representative models, comprising two GPT-3 series models and four GPT-3.5 series models.
We evaluate their performance on nine natural language understanding (NLU) tasks using 21 datasets.
Our experiments reveal that the overall ability of GPT series models on NLU tasks does not increase gradually as the models evolve.
arXiv Detail & Related papers (2023-03-18T14:02:04Z) - Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z) - Maximizing Use-Case Specificity through Precision Model Tuning [0.0]
We present an in-depth analysis of the performance of four transformer-based language models on the task of biomedical information retrieval.
Our findings suggest that smaller models, with 10B parameters and fine-tuned on domain-specific datasets, tend to outperform larger language models on highly specific questions.
arXiv Detail & Related papers (2022-12-29T07:50:14Z) - Scaling Instruction-Finetuned Language Models [126.4789306516927]
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance.
We find that instruction finetuning dramatically improves performance on a variety of model classes.
arXiv Detail & Related papers (2022-10-20T16:58:32Z) - Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages.
Our largest model sets new state of the art in few-shot learning in more than 20 representative languages.
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z) - Reframing Instructional Prompts to GPTk's Language [72.69833640335519]
We propose reframing techniques for model designers to create effective prompts for language models.
Our results show that reframing improves few-shot learning performance by 14% while reducing sample complexity.
The performance gains are particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is not feasible.
arXiv Detail & Related papers (2021-09-16T09:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.