Related papers: KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models

KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models

URL: http://arxiv.org/abs/2409.13749v1
Date: Fri, 13 Sep 2024 16:43:08 GMT
Title: KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models
Authors: Neel Rajani, Lilli Kiessling, Aleksandr Ogaltsov, Claus Lang,
Abstract summary: KodeXv0.1 is a family of large language models that outclass GPT-4 in financial question answering. We process a large number of publicly available financial documents such as earnings calls and business reports.
Score: 41.94295877935867
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Although powerful, current cutting-edge LLMs may not fulfil the needs of highly specialised sectors. We introduce KodeXv0.1, a family of large language models that outclass GPT-4 in financial question answering. We utilise the base variants of Llama 3.1 8B and 70B and adapt them to the financial domain through a custom training regime. To this end, we collect and process a large number of publicly available financial documents such as earnings calls and business reports. These are used to generate a high-quality, synthetic dataset consisting of Context-Question-Answer triplets which closely mirror real-world financial tasks. Using the train split of this dataset, we perform RAG-aware 4bit LoRA instruction tuning runs of Llama 3.1 base variants to produce KodeX-8Bv0.1 and KodeX-70Bv0.1. We then complete extensive model evaluations using FinanceBench, FinQABench and the withheld test split of our dataset. Our results show that KodeX-8Bv0.1 is more reliable in financial contexts than cutting-edge instruct models in the same parameter regime, surpassing them by up to 9.24%. In addition, it is even capable of outperforming state-of-the-art proprietary models such as GPT-4 by up to 7.07%. KodeX-70Bv0.1 represents a further improvement upon this, exceeding GPT-4's performance on every tested benchmark.

Related papers

LLM Output Drift: Cross-Provider Validation & Mitigation for Financial Workflows [0.5798758080057375]
Nondeterministic outputs (output drift) undermine auditability and trust.<n>We quantify drift across five model architectures on regulated financial tasks.<n>This finding challenges conventional assumptions that larger models are universally superior for production deployment.
arXiv Detail & Related papers (2025-11-10T19:54:00Z)
BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs [7.9458352414205295]
Large language models excel in general tasks, yet assessing their reliability in logic-heavy, precision-critical domains like finance, law, and healthcare remains challenging.<n>We introduce BizFinBench, the first benchmark specifically designed to evaluate LLMs in real-world financial applications.<n> BizFinBench consists of 6,781 well-annotated queries in Chinese, spanning five dimensions: numerical calculation, reasoning, information extraction, prediction recognition, and knowledge-based question answering.
arXiv Detail & Related papers (2025-05-26T03:23:02Z)
SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints [59.645885492637845]
SOPBench is an evaluation pipeline that transforms each service-specific SOP code program into a directed graph of executable functions.<n>Our approach transforms each service-specific SOP code program into a directed graph of executable functions and requires agents to call these functions based on natural language SOP descriptions.<n>We evaluate 18 leading models, and results show the task is challenging even for top-tier models.
arXiv Detail & Related papers (2025-03-11T17:53:02Z)
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts [47.18738316044761]
This dataset contains 12,705 high-quality Chinese dialogue instructions derived from 440 flowcharts containing 5,055 process nodes. Based on PlantUML specification, each flowchart is converted into atomic dialogue units i.e., structured five-tuples. Experimental results demonstrate that a 7B model trained with merely 800 samples, and a 0.5B model trained on total data both can surpass 90% accuracy.
arXiv Detail & Related papers (2025-03-09T17:43:30Z)
FinMTEB: Finance Massive Text Embedding Benchmark [18.990655668481075]
We introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a specialized counterpart to MTEB designed for the financial domain. FinMTEB comprises 64 financial domain-specific embedding datasets across 7 tasks. We show three key findings: (1) performance on general-purpose benchmarks shows limited correlation with financial domain tasks; (2) domain-adapted models consistently outperform their general-purpose counterparts; and (3) surprisingly, a simple Bag-of-Words approach outperforms sophisticated dense embeddings in financial Semantic Textual Similarity tasks.
arXiv Detail & Related papers (2025-02-16T04:23:52Z)
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance [32.516564836540745]
Large language models (LLMs) have shown strong general reasoning capabilities, but their effectiveness in financial reasoning remains underexplored. We evaluate 24 state-of-the-art general and reasoning-focused LLMs across four complex financial reasoning tasks. We propose two domain-adapted models, Fino1-8B and FinoB, trained with chain-of-thought (CoT) fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-02-12T05:13:04Z)
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications [90.67346776473241]
Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. We introduce textitOpen-FinLLMs, a series of Financial LLMs that embed comprehensive financial knowledge into text, tables, and time-series data. We also present FinLLaVA, a multimodal LLM trained with 1.43M image-text instructions to handle complex financial data types.
arXiv Detail & Related papers (2024-08-20T16:15:28Z)
SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models [6.639972934967109]
Large language models (LLMs) have become powerful tools for advancing natural language processing applications in the financial industry. We propose a novel large language model specifically designed for the Chinese financial domain, named SNFinLLM. SNFinLLM excels in domain-specific tasks such as answering questions, summarizing financial research reports, analyzing sentiment, and executing financial calculations.
arXiv Detail & Related papers (2024-08-05T08:24:24Z)
CryptoGPT: a 7B model rivaling GPT-4 in the task of analyzing and classifying real-time financial news [3.8447306272420816]
We present a method aimed at refining a dedicated LLM of reasonable quality with limited resources in an industrial setting via CryptoGPT. This model allows not only for the classification of financial information but also for providing comprehensive analysis.
arXiv Detail & Related papers (2024-06-20T06:59:46Z)
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [49.10029030628653]
Large language models' (LLMs) ability to process structured data lags behind state-of-the-art (SoTA) model by an average of 35%. We train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters. Our StructLM series surpasses task-specific models on 16 out of 18 evaluated datasets and establishes new SoTA performance on 8 SKG tasks.
arXiv Detail & Related papers (2024-02-26T15:47:01Z)
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models [18.280762424107408]
FinTral is a suite of state-of-the-art multimodal large language models (LLMs) built upon the Mistral-7b model. We enhance FinTral with domain-specific pretraining, instruction fine-tuning, and RLAIF training. Our FinTral model trained with direct preference optimization employing advanced Tools and Retrieval methods, dubbed FinTral-DPO-T&R, demonstrates an exceptional zero-shot performance.
arXiv Detail & Related papers (2024-02-16T05:05:12Z)
PanGu-$\pi$: Enhancing Language Model Architectures via Nonlinearity Compensation [97.78045712375047]
We present a new efficient model architecture for large language models (LLMs) We show that PanGu-$pi$-7B can achieve a comparable performance to that of benchmarks with about 10% inference speed-up. In addition, we have deployed PanGu-$pi$-7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application.
arXiv Detail & Related papers (2023-12-27T11:49:24Z)
CFGPT: Chinese Financial Assistant with Large Language Model [21.54229667774752]
We present a Chinese Financial Generative Pre-trained Transformer framework, named CFGPT. CFData comprises both a pre-training dataset and a supervised fine-tuning dataset. CFLLM is trained on CFData in two stage, continued pre-training and supervised fine-tuning.
arXiv Detail & Related papers (2023-09-19T14:34:01Z)
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources [117.6496550359768]
This work explores recent advances in instruction-tuning language models on a range of open instruction-following datasets. We provide a large set of instruction-tuned models from 6.7B to 65B parameters in size, trained on 12 instruction datasets. We evaluate them on their factual knowledge, reasoning, multilinguality, coding, and open-ended instruction following abilities.
arXiv Detail & Related papers (2023-06-07T19:59:23Z)
BloombergGPT: A Large Language Model for Finance [42.73350054822628]
We present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, augmented with 345 billion tokens from general purpose datasets. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins.
arXiv Detail & Related papers (2023-03-30T17:30:36Z)
Predicting Issue Types with seBERT [85.74803351913695]
seBERT is a model that was developed based on the BERT architecture, but trained from scratch with software engineering data. We fine-tuned this model for the NLBSE challenge for the task of issue type prediction. Our model dominates the baseline fastText for all three issue types in both recall and precisio to achieve an overall F1-score of 85.7%.
arXiv Detail & Related papers (2022-05-03T06:47:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.