Is ChatGPT a Financial Expert? Evaluating Language Models on Financial
Natural Language Processing
- URL: http://arxiv.org/abs/2310.12664v1
- Date: Thu, 19 Oct 2023 11:43:15 GMT
- Title: Is ChatGPT a Financial Expert? Evaluating Language Models on Financial
Natural Language Processing
- Authors: Yue Guo, Zian Xu, Yi Yang
- Abstract summary: FinLMEval is a framework for Financial Language Model Evaluation.
This study compares the performance of encoder-only language models and the decoder-only language models.
- Score: 22.754757518792395
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of Large Language Models (LLMs), such as ChatGPT, has
revolutionized general natural language preprocessing (NLP) tasks. However,
their expertise in the financial domain lacks a comprehensive evaluation. To
assess the ability of LLMs to solve financial NLP tasks, we present FinLMEval,
a framework for Financial Language Model Evaluation, comprising nine datasets
designed to evaluate the performance of language models. This study compares
the performance of encoder-only language models and the decoder-only language
models. Our findings reveal that while some decoder-only LLMs demonstrate
notable performance across most financial tasks via zero-shot prompting, they
generally lag behind the fine-tuned expert models, especially when dealing with
proprietary datasets. We hope this study provides foundation evaluations for
continuing efforts to build more advanced LLMs in the financial domain.
Related papers
- SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models [6.639972934967109]
Large language models (LLMs) have become powerful tools for advancing natural language processing applications in the financial industry.
We propose a novel large language model specifically designed for the Chinese financial domain, named SNFinLLM.
SNFinLLM excels in domain-specific tasks such as answering questions, summarizing financial research reports, analyzing sentiment, and executing financial calculations.
arXiv Detail & Related papers (2024-08-05T08:24:24Z) - FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks.
FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z) - D\'olares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs
Between Spanish and English [67.48541936784501]
Tois'on de Oro is the first framework that establishes instruction datasets, finetuned LLMs, and evaluation benchmark for financial LLMs in Spanish joint with English.
We construct a rigorously curated bilingual instruction dataset including over 144K Spanish and English samples from 15 datasets covering 7 tasks.
We evaluate our model and existing LLMs using FLARE-ES, the first comprehensive bilingual evaluation benchmark with 21 datasets covering 9 tasks.
arXiv Detail & Related papers (2024-02-12T04:50:31Z) - Large Language Model Adaptation for Financial Sentiment Analysis [2.0499240875882]
Generalist language models tend to fall short in tasks specifically tailored for finance.
Two foundation models with less than 1.5B parameters have been adapted using a wide range of strategies.
We show that small LLMs have comparable performance to larger scale models, while being more efficient in terms of parameters and data.
arXiv Detail & Related papers (2024-01-26T11:04:01Z) - Revolutionizing Finance with LLMs: An Overview of Applications and
Insights [47.11391223936608]
Large Language Models (LLMs) like ChatGPT have seen considerable advancements and have been applied in diverse fields.
These models are being utilized for automating financial report generation, forecasting market trends, analyzing investor sentiment, and offering personalized financial advice.
arXiv Detail & Related papers (2024-01-22T01:06:17Z) - DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple
Experts Fine-tuning [74.99318727786337]
We propose Multiple Experts Fine-tuning Framework to build a financial large language model (LLM)
We build a financial instruction-tuning dataset named DISC-FIN-SFT, including instruction samples of four categories (consulting, NLP tasks, computing and retrieval-augmented generation)
Evaluations conducted on multiple benchmarks demonstrate that our model performs better than baseline models in various financial scenarios.
arXiv Detail & Related papers (2023-10-23T11:33:41Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Large Language Models in Finance: A Survey [12.243277149505364]
Large language models (LLMs) have opened new possibilities for artificial intelligence applications in finance.
Recent advances in large language models (LLMs) have opened new possibilities for artificial intelligence applications in finance.
arXiv Detail & Related papers (2023-09-28T06:04:04Z) - PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark
for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data.
We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks.
We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z) - WHEN FLUE MEETS FLANG: Benchmarks and Large Pre-trained Language Model
for Financial Domain [42.093876880881886]
We propose a novel domain specific Financial LANGuage model (FLANG)
It uses financial keywords and phrases for better masking, together with span boundary objective and in-filing objective.
Our models, code and benchmark data are publicly available on Github and Huggingface.
arXiv Detail & Related papers (2022-10-31T18:35:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.