Optimizing Performance: How Compact Models Match or Exceed GPT's Classification Capabilities through Fine-Tuning
- URL: http://arxiv.org/abs/2409.11408v1
- Date: Thu, 22 Aug 2024 09:10:43 GMT
- Title: Optimizing Performance: How Compact Models Match or Exceed GPT's Classification Capabilities through Fine-Tuning
- Authors: Baptiste Lefort, Eric Benhamou, Jean-Jacques Ohana, David Saltiel, Beatrice Guez,
- Abstract summary: Non-generative, small-sized models can outperform GPT-3.5 and GPT-4 models in zero-shot learning settings.
Fine-tuned models show comparable results to GPT-3.5 when it is fine-tuned on the task of determining market sentiment.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we demonstrate that non-generative, small-sized models such as FinBERT and FinDRoBERTa, when fine-tuned, can outperform GPT-3.5 and GPT-4 models in zero-shot learning settings in sentiment analysis for financial news. These fine-tuned models show comparable results to GPT-3.5 when it is fine-tuned on the task of determining market sentiment from daily financial news summaries sourced from Bloomberg. To fine-tune and compare these models, we created a novel database, which assigns a market score to each piece of news without human interpretation bias, systematically identifying the mentioned companies and analyzing whether their stocks have gone up, down, or remained neutral. Furthermore, the paper shows that the assumptions of Condorcet's Jury Theorem do not hold suggesting that fine-tuned small models are not independent of the fine-tuned GPT models, indicating behavioural similarities. Lastly, the resulted fine-tuned models are made publicly available on HuggingFace, providing a resource for further research in financial sentiment analysis and text classification.
Related papers
- Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - On Sarcasm Detection with OpenAI GPT-based Models [0.0]
Sarcasm is a form of irony that requires readers or listeners to interpret its intended meaning by considering context and social cues.
Machine learning classification models have long had difficulty detecting sarcasm due to its social complexity and contradictory nature.
This paper explores the applications of the Generative Pretrained Transformer (GPT) models, including GPT-3, InstructGPT, GPT-3.5, and GPT-4, in detecting sarcasm in natural language.
arXiv Detail & Related papers (2023-12-07T19:00:56Z) - Large language models for aspect-based sentiment analysis [0.0]
We assess the performance of GPT-4 and GPT-3.5 in zero shot, few shot and fine-tuned settings.
Fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task.
arXiv Detail & Related papers (2023-10-27T10:03:21Z) - Breaking the Bank with ChatGPT: Few-Shot Text Classification for Finance [4.305568120980929]
In-context learning with GPT-3.5 and GPT-4 minimizes the technical expertise required and eliminates the need for expensive GPU computing.
We fine-tune other pre-trained, masked language models with SetFit to achieve state-of-the-art results both in full-data and few-shot settings.
Our findings show that querying GPT-3.5 and GPT-4 can outperform fine-tuned, non-generative models even with fewer examples.
arXiv Detail & Related papers (2023-08-28T15:04:16Z) - DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5.
We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information.
Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z) - InheritSumm: A General, Versatile and Compact Summarizer by Distilling
from GPT [75.29359361404073]
InheritSumm is a versatile and compact summarization model derived from GPT-3.5 through distillation.
It achieves similar or superior performance to GPT-3.5 in zeroshot and fewshot settings.
arXiv Detail & Related papers (2023-05-22T14:52:32Z) - A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models [71.42197262495056]
GPT series models have gained considerable attention due to their exceptional natural language processing capabilities.
We select six representative models, comprising two GPT-3 series models and four GPT-3.5 series models.
We evaluate their performance on nine natural language understanding (NLU) tasks using 21 datasets.
Our experiments reveal that the overall ability of GPT series models on NLU tasks does not increase gradually as the models evolve.
arXiv Detail & Related papers (2023-03-18T14:02:04Z) - GPT-Neo for commonsense reasoning -- a theoretical and practical lens [0.46040036610482665]
We evaluate the performance of the GPT-neo model using $6$ commonsense reasoning benchmark tasks.
We aim to examine the performance of smaller models using the GPT-neo models against several larger model baselines.
arXiv Detail & Related papers (2022-11-28T17:49:38Z) - Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality.
We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z) - News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets.
We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.