Related papers: A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification

A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification

URL: http://arxiv.org/abs/2411.02476v1
Date: Mon, 04 Nov 2024 18:06:36 GMT
Title: A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification
Authors: Sorouralsadat Fatemi, Yuheng Hu, Maryam Mousavi,
Abstract summary: Large Language Models (LLMs) have demonstrated impressive capabilities across diverse Natural Language Processing (NLP) tasks. This study investigates the efficacy of instruction fine-tuning to enhance their performance in financial text classification tasks.
Score: 0.8192907805418583
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across diverse Natural Language Processing (NLP) tasks, including language understanding, reasoning, and generation. However, general-domain LLMs often struggle with financial tasks due to the technical and specialized nature of financial texts. This study investigates the efficacy of instruction fine-tuning smaller-scale LLMs, including Mistral-7B, Llama3-8B, and Phi3-mini, to enhance their performance in financial text classification tasks. We fine-tuned both instruction-tuned and base models across four financial classification tasks, achieving significant improvements in task-specific performance. Furthermore, we evaluated the zero-shot capabilities of these fine-tuned models on three unseen complex financial tasks, including argument classification, deal completeness classification, and causal classification. Our results indicate while base model fine-tuning led to greater degradation, instruction-tuned models maintained more robust performance. To address this degradation, we employed model merging techniques, integrating single-task domain-specific fine-tuned models with the base model. Using this merging method resulted in significant enhancements in zero-shot performance, even exceeding the original model's accuracy on certain datasets. Our findings underscore the effectiveness of instruction fine-tuning and model merging for adapting LLMs to specialized financial text classification tasks.

Related papers

LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models [7.216206616406649]
Large language models (LLMs) like BloombergGPT and FinMA have set new benchmarks across various financial NLP tasks.<n>We propose Layer-wise Adaptive Ensemble Tuning (LAET), a novel strategy that selectively fine-tunes the most effective layers of pre-trained LLMs.<n>Our approach shows strong results in financial NLP tasks, outperforming existing benchmarks and state-of-the-art LLMs.
arXiv Detail & Related papers (2025-11-14T13:57:46Z)
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z)
Catastrophic Forgetting in LLMs: A Comparative Analysis Across Language Tasks [0.0]
Large Language Models (LLMs) have significantly advanced Natural Language Processing (NLP) This study evaluates the continual fine-tuning of various open-source LLMs on key NLU tasks. Our results indicate that models such as Phi-3.5-mini exhibit minimal forgetting while maintaining strong learning capabilities.
arXiv Detail & Related papers (2025-04-01T23:06:55Z)
ZiGong 1.0: A Large Language Model for Financial Credit [8.49779245416985]
Large Language Models (LLMs) have demonstrated strong performance across various general Natural Language Processing (NLP) tasks. However, their effectiveness in financial credit assessment applications remains suboptimal. We propose ZiGong, a Mistral-based model enhanced through multi-task supervised fine-tuning.
arXiv Detail & Related papers (2025-02-22T09:27:56Z)
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [61.99353167168545]
We show that fine-tuning with LLM-generated data improves target task performance and reduces non-target task degradation.<n>This is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning.
arXiv Detail & Related papers (2025-01-24T08:18:56Z)
The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples. We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts. Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z)
Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance [0.32985979395737774]
We study the application of large language models (LLMs) in domain-specific contexts, including finance. We find that fine-tuning exclusively on the target task is not always the most effective strategy. Instead, multi-task fine-tuning can significantly enhance performance.
arXiv Detail & Related papers (2024-10-01T22:35:56Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications [10.225210627594894]
This paper presents our solution to IJCAI-2024 FinLLM challenge, investigating the capabilities of LLMs within three critical areas of financial tasks. Financial classification, financial text summarization, and single stock trading are investigated. Our approach aims to tackle these diverse tasks in a comprehensive and integrated manner, showcasing LLMs' capacity to address diverse and complex financial tasks with improved accuracy and decision-making capabilities.
arXiv Detail & Related papers (2024-07-02T05:04:13Z)
Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs) Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks. We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z)
Large Language Model Adaptation for Financial Sentiment Analysis [2.0499240875882]
Generalist language models tend to fall short in tasks specifically tailored for finance. Two foundation models with less than 1.5B parameters have been adapted using a wide range of strategies. We show that small LLMs have comparable performance to larger scale models, while being more efficient in terms of parameters and data.
arXiv Detail & Related papers (2024-01-26T11:04:01Z)
LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English. When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z)
BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS) We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting. Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z)
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks [36.84636748560657]
Large language models such as ChatGPT and GPT-4 have shown exceptional capabilities of generalist models. How effective are such models in the financial domain?
arXiv Detail & Related papers (2023-05-10T03:13:54Z)
Model-Agnostic Meta-Learning for Natural Language Understanding Tasks in Finance [1.863067234952186]
We investigate model-agnostic meta-learning algorithm(MAML) in low-resource financial NLU tasks. Our models achieve the state-of-the-art performance according to the experimental results.
arXiv Detail & Related papers (2023-03-06T02:24:48Z)
Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks. Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients. We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.