Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text
Analytics? A Study on Several Typical Tasks
- URL: http://arxiv.org/abs/2305.05862v2
- Date: Tue, 10 Oct 2023 18:54:43 GMT
- Title: Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text
Analytics? A Study on Several Typical Tasks
- Authors: Xianzhi Li, Samuel Chan, Xiaodan Zhu, Yulong Pei, Zhiqiang Ma, Xiaomo
Liu and Sameena Shah
- Abstract summary: Large language models such as ChatGPT and GPT-4 have shown exceptional capabilities of generalist models.
How effective are such models in the financial domain?
- Score: 36.84636748560657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The most recent large language models(LLMs) such as ChatGPT and GPT-4 have
shown exceptional capabilities of generalist models, achieving state-of-the-art
performance on a wide range of NLP tasks with little or no adaptation. How
effective are such models in the financial domain? Understanding this basic
question would have a significant impact on many downstream financial
analytical tasks. In this paper, we conduct an empirical study and provide
experimental evidences of their performance on a wide variety of financial text
analytical problems, using eight benchmark datasets from five categories of
tasks. We report both the strengths and limitations of the current models by
comparing them to the state-of-the-art fine-tuned approaches and the recently
released domain-specific pretrained models. We hope our study can help
understand the capability of the existing models in the financial domain and
facilitate further improvements.
Related papers
- A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification [0.8192907805418583]
Large Language Models (LLMs) have demonstrated impressive capabilities across diverse Natural Language Processing (NLP) tasks.
This study investigates the efficacy of instruction fine-tuning to enhance their performance in financial text classification tasks.
arXiv Detail & Related papers (2024-11-04T18:06:36Z) - Fine-tuning Smaller Language Models for Question Answering over Financial Documents [0.1747623282473278]
We focus on the challenge of answering questions that require multi-hop numerical reasoning over financial texts.
We assess the performance of several smaller models that have been fine-tuned to generate programs.
Our empirical analysis indicates that fine-tuning refines the student models ability to express and apply the required financial concepts.
arXiv Detail & Related papers (2024-08-22T12:23:29Z) - Large Language Model Adaptation for Financial Sentiment Analysis [2.0499240875882]
Generalist language models tend to fall short in tasks specifically tailored for finance.
Two foundation models with less than 1.5B parameters have been adapted using a wide range of strategies.
We show that small LLMs have comparable performance to larger scale models, while being more efficient in terms of parameters and data.
arXiv Detail & Related papers (2024-01-26T11:04:01Z) - PanGu-$\pi$: Enhancing Language Model Architectures via Nonlinearity
Compensation [97.78045712375047]
We present a new efficient model architecture for large language models (LLMs)
We show that PanGu-$pi$-7B can achieve a comparable performance to that of benchmarks with about 10% inference speed-up.
In addition, we have deployed PanGu-$pi$-7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application.
arXiv Detail & Related papers (2023-12-27T11:49:24Z) - Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4
on mock CFA Exams [26.318005637849915]
This study aims at assessing the financial reasoning capabilities of Large Language Models (LLMs)
We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4.
We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams.
arXiv Detail & Related papers (2023-10-12T19:28:57Z) - MathVista: Evaluating Mathematical Reasoning of Foundation Models in
Visual Contexts [170.01089233942594]
MathVista is a benchmark designed to combine challenges from diverse mathematical and visual tasks.
The best-performing GPT-4V model achieves an overall accuracy of 49.9%, substantially outperforming Bard, the second-best performer, by 15.1%.
GPT-4V still falls short of human performance by 10.4%, as it often struggles to understand complex figures and perform rigorous reasoning.
arXiv Detail & Related papers (2023-10-03T17:57:24Z) - PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark
for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data.
We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks.
We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z) - Exploring the Trade-Offs: Unified Large Language Models vs Local
Fine-Tuned Models for Highly-Specific Radiology NLI Task [49.50140712943701]
We evaluate the performance of ChatGPT/GPT-4 on a radiology NLI task and compare it to other models fine-tuned specifically on task-related data samples.
We also conduct a comprehensive investigation on ChatGPT/GPT-4's reasoning ability by introducing varying levels of inference difficulty.
arXiv Detail & Related papers (2023-04-18T17:21:48Z) - WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model
for Financial Domain [42.093876880881886]
We propose a novel domain specific Financial LANGuage model (FLANG)
It uses financial keywords and phrases for better masking, together with span boundary objective and in-filing objective.
Our models, code and benchmark data are publicly available on Github and Huggingface.
arXiv Detail & Related papers (2022-10-31T18:35:18Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.