On Sarcasm Detection with OpenAI GPT-based Models
- URL: http://arxiv.org/abs/2312.04642v1
- Date: Thu, 7 Dec 2023 19:00:56 GMT
- Title: On Sarcasm Detection with OpenAI GPT-based Models
- Authors: Montgomery Gole and Williams-Paul Nwadiugwu and Andriy Miranskyy
- Abstract summary: Sarcasm is a form of irony that requires readers or listeners to interpret its intended meaning by considering context and social cues.
Machine learning classification models have long had difficulty detecting sarcasm due to its social complexity and contradictory nature.
This paper explores the applications of the Generative Pretrained Transformer (GPT) models, including GPT-3, InstructGPT, GPT-3.5, and GPT-4, in detecting sarcasm in natural language.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sarcasm is a form of irony that requires readers or listeners to interpret
its intended meaning by considering context and social cues. Machine learning
classification models have long had difficulty detecting sarcasm due to its
social complexity and contradictory nature.
This paper explores the applications of the Generative Pretrained Transformer
(GPT) models, including GPT-3, InstructGPT, GPT-3.5, and GPT-4, in detecting
sarcasm in natural language. It tests fine-tuned and zero-shot models of
different sizes and releases.
The GPT models were tested on the political and balanced (pol-bal) portion of
the popular Self-Annotated Reddit Corpus (SARC 2.0) sarcasm dataset. In the
fine-tuning case, the largest fine-tuned GPT-3 model achieves accuracy and
$F_1$-score of 0.81, outperforming prior models. In the zero-shot case, one of
GPT-4 models yields an accuracy of 0.70 and $F_1$-score of 0.75. Other models
score lower. Additionally, a model's performance may improve or deteriorate
with each release, highlighting the need to reassess performance after each
release.
Related papers
- Optimizing Performance: How Compact Models Match or Exceed GPT's Classification Capabilities through Fine-Tuning [0.0]
Non-generative, small-sized models can outperform GPT-3.5 and GPT-4 models in zero-shot learning settings.
Fine-tuned models show comparable results to GPT-3.5 when it is fine-tuned on the task of determining market sentiment.
arXiv Detail & Related papers (2024-08-22T09:10:43Z) - Model Editing with Canonical Examples [75.33218320106585]
We introduce model editing with canonical examples.
A canonical example is a simple instance of good behavior, e.g., The capital of Mauritius is Port Louis.
We propose sense finetuning, which selects and finetunes a few sense vectors for each canonical example.
arXiv Detail & Related papers (2024-02-09T03:08:12Z) - Large language models for aspect-based sentiment analysis [0.0]
We assess the performance of GPT-4 and GPT-3.5 in zero shot, few shot and fine-tuned settings.
Fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task.
arXiv Detail & Related papers (2023-10-27T10:03:21Z) - DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5.
We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information.
Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z) - InheritSumm: A General, Versatile and Compact Summarizer by Distilling
from GPT [75.29359361404073]
InheritSumm is a versatile and compact summarization model derived from GPT-3.5 through distillation.
It achieves similar or superior performance to GPT-3.5 in zeroshot and fewshot settings.
arXiv Detail & Related papers (2023-05-22T14:52:32Z) - A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models [71.42197262495056]
GPT series models have gained considerable attention due to their exceptional natural language processing capabilities.
We select six representative models, comprising two GPT-3 series models and four GPT-3.5 series models.
We evaluate their performance on nine natural language understanding (NLU) tasks using 21 datasets.
Our experiments reveal that the overall ability of GPT series models on NLU tasks does not increase gradually as the models evolve.
arXiv Detail & Related papers (2023-03-18T14:02:04Z) - News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets.
We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z) - Elaboration-Generating Commonsense Question Answering at Scale [77.96137534751445]
In question answering requiring common sense, language models (e.g., GPT-3) have been used to generate text expressing background knowledge.
We finetune smaller language models to generate useful intermediate context, referred to here as elaborations.
Our framework alternates between updating two language models -- an elaboration generator and an answer predictor -- allowing each to influence the other.
arXiv Detail & Related papers (2022-09-02T18:32:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.