Related papers: Enhancing TinyBERT for Financial Sentiment Analysis Using GPT-Augmented FinBERT Distillation

Enhancing TinyBERT for Financial Sentiment Analysis Using GPT-Augmented FinBERT Distillation

URL: http://arxiv.org/abs/2409.18999v1
Date: Thu, 19 Sep 2024 10:22:23 GMT
Title: Enhancing TinyBERT for Financial Sentiment Analysis Using GPT-Augmented FinBERT Distillation
Authors: Graison Jos Thomas,
Abstract summary: This study proposes leveraging the generative capabilities of large language models (LLMs) to create synthetic, domain-specific training data. The research specifically aims to enhance FinBERT, a BERT model fine-tuned for financial sentiment analysis, and develop TinyFinBERT, a compact transformer model.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the rapidly evolving field of financial sentiment analysis, the efficiency and accuracy of predictive models are critical due to their significant impact on financial markets. Transformer based models like BERT and large language models (LLMs) like GPT-4, have advanced NLP tasks considerably. Despite their advantages, BERT-based models face challenges with computational intensity in edge computing environments, and the substantial size and compute requirements of LLMs limit their practical deployment. This study proposes leveraging the generative capabilities of LLMs, such as GPT-4 Omni, to create synthetic, domain-specific training data. This approach addresses the challenge of data scarcity and enhances the performance of smaller models by making them competitive with their larger counterparts. The research specifically aims to enhance FinBERT, a BERT model fine-tuned for financial sentiment analysis, and develop TinyFinBERT, a compact transformer model, through a structured, two-tiered knowledge distillation strategy. Using data augmented by GPT-4 Omni, which involves generating new training examples and transforming existing data, we significantly improved the accuracy of FinBERT, preparing it to serve as a teacher model. This enhanced FinBERT then distilled knowledge to TinyFinBERT, employing both GPT-4 Omni and GPT-3.5 Turbo augmented data. The distillation strategy incorporated both logit and intermediate layer distillation. The training and evaluation of TinyFinBERT utilized the PhraseBank dataset and the FiQA 2018 Task1 dataset, achieving performance comparable to FinBERT while being substantially smaller and more efficient. This research demonstrates how LLMs can effectively contribute to the advancement of financial sentiment analysis by enhancing the capabilities of smaller, more efficient models through innovative data augmentation and distillation techniques.

Related papers

The carbon cost of materials discovery: Can machine learning really accelerate the discovery of new photovoltaics? [0.05524804393257919]
Computational screening has become a powerful complement to experimental efforts in the discovery of high-performance photovoltaic (PV) materials.<n>Most rely on density functional theory (DFT) to estimate electronic and optical properties relevant to solar energy conversion.<n>Machine learning (ML) models have recently gained attention as surrogates for DFT, offering drastic reductions in resource use with competitive predictive performance.
arXiv Detail & Related papers (2025-07-17T15:55:02Z)
FinBERT2: A Specialized Bidirectional Encoder for Bridging the Gap in Finance-Specific Deployment of Large Language Models [24.430050834440998]
FinBERT2 is a specialized bidirectional encoder pretrained on a high-quality, financial-specific corpus of 32b tokens.<n>Discriminative fine-tuned models (Fin-Labelers) outperform other (Fin)BERT variants by 0.4%-3.3% and leading LLMs by 9.7%-12.3% on average across five financial classification tasks.<n>Fin-TopicModel enables superior clustering and topic representation for financial titles.
arXiv Detail & Related papers (2025-05-31T13:59:44Z)
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding [61.15402517835137]
We build a supervised fine-tuning (SFT) dataset to achieve state-of-the-art coding capability results in models of various sizes. Our models use only SFT to achieve 61.8% on LiveCodeBench and 24.6% on CodeContests, surpassing alternatives trained with reinforcement learning.
arXiv Detail & Related papers (2025-04-02T17:50:31Z)
ZiGong 1.0: A Large Language Model for Financial Credit [8.49779245416985]
Large Language Models (LLMs) have demonstrated strong performance across various general Natural Language Processing (NLP) tasks. However, their effectiveness in financial credit assessment applications remains suboptimal. We propose ZiGong, a Mistral-based model enhanced through multi-task supervised fine-tuning.
arXiv Detail & Related papers (2025-02-22T09:27:56Z)
FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation [13.815905522616838]
Finetuning large language models (LLMs) has shown remarkable performance in financial tasks. However, finetuning FinLLMs poses challenges including GPU memory constraints and long input sequences. We employ quantized low-rank adaptation (QLoRA) to finetune FinLLMs, which leverage low-rank matrix decomposition and quantization techniques.
arXiv Detail & Related papers (2024-12-16T02:05:49Z)
Innovative Sentiment Analysis and Prediction of Stock Price Using FinBERT, GPT-4 and Logistic Regression: A Data-Driven Approach [0.0]
This study explores the comparative performance of cutting-edge AI models, i.e., Finaance Bidirectional representations from Transsformers (FinBERT), Generatice Pre-trained Transformer GPT-4, and Logistic Regression, for sentiment analysis and stock index prediction. By leveraging advanced natural language processing models like GPT-4 and FinBERT, alongside a traditional machine learning model, Logistic Regression, we aim to classify market sentiment, generate sentiment scores, and predict market price movements.
arXiv Detail & Related papers (2024-12-07T05:20:31Z)
AI in Investment Analysis: LLMs for Equity Stock Ratings [0.2916558661202724]
This paper explores the application of Large Language Models (LLMs) to generate multi-horizon stock ratings. Our study addresses these issues by leveraging LLMs to improve the accuracy and consistency of stock ratings. Our results show that our benchmark method outperforms traditional stock rating methods when assessed by forward returns.
arXiv Detail & Related papers (2024-10-30T15:06:57Z)
Harnessing Earnings Reports for Stock Predictions: A QLoRA-Enhanced LLM Approach [6.112119533910774]
This paper introduces an advanced approach by employing Large Language Models (LLMs) instruction fine-tuned with a novel combination of instruction-based techniques and quantized low-rank adaptation (QLoRA) compression. Our methodology integrates 'base factors', such as financial metric growth and earnings transcripts, with 'external factors', including recent market indices performances and analyst grades, to create a rich, supervised dataset. This study not only demonstrates the power of integrating cutting-edge AI with fine-tuned financial data but also paves the way for future research in enhancing AI-driven financial analysis tools.
arXiv Detail & Related papers (2024-08-13T04:53:31Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
Sentiment-driven prediction of financial returns: a Bayesian-enhanced FinBERT approach [1.131316248570352]
We showcase the efficacy of leveraging sentiment information extracted from tweets using the FinBERT large language model. This success translates into demonstrably higher cumulative profits during backtested trading.
arXiv Detail & Related papers (2024-03-07T11:56:36Z)
FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z)
Churn Prediction via Multimodal Fusion Learning:Integrating Customer Financial Literacy, Voice, and Behavioral Data [14.948017876322597]
This paper proposes a multimodal fusion learning model for identifying customer churn risk levels in financial service providers. Our approach integrates customer sentiments financial literacy (FL) level, and financial behavioral data. Our novel approach demonstrates a marked improvement in churn prediction, achieving a test accuracy of 91.2%, a Mean Average Precision (MAP) score of 66, and a Macro-Averaged F1 score of 54.
arXiv Detail & Related papers (2023-12-03T06:28:55Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
Enabling and Analyzing How to Efficiently Extract Information from Hybrid Long Documents with LLMs [48.87627426640621]
This research focuses on harnessing the potential of Large Language Models to comprehend critical information from financial reports. We propose an Automated Financial Information Extraction framework that enhances LLMs' ability to comprehend and extract information from financial reports. Our framework is effectively validated on GPT-3.5 and GPT-4, yielding average accuracy increases of 53.94% and 33.77%, respectively.
arXiv Detail & Related papers (2023-05-24T10:35:58Z)
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines. We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)
Portfolio Optimization with 2D Relative-Attentional Gated Transformer [9.541129630971689]
We propose a novel Deterministic Policy Gradient with 2D Relative-attentional Gated Transformer (DPGRGT) model. Applying learnable relative positional embeddings for the time and assets axes, the model better understands the peculiar structure of the financial data. In our experiment using U.S. stock market data of 20 years, our model outperformed baseline models and demonstrated its effectiveness.
arXiv Detail & Related papers (2020-12-27T14:08:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.