Related papers: Baichuan4-Finance Technical Report

Baichuan4-Finance Technical Report

URL: http://arxiv.org/abs/2412.15270v2
Date: Thu, 02 Jan 2025 11:21:38 GMT
Title: Baichuan4-Finance Technical Report
Authors: Hanyu Zhang, Boyu Qiu, Yuhao Feng, Shuqi Li, Qian Ma, Xiyuan Zhang, Qiang Ju, Dong Yan, Jian Xie,
Abstract summary: We develop Baichuan4-Finance series, including Baichuan4-Finance-Base and an aligned language model Baichuan4-Finance.<n>In the continual pre-training phase, we propose a novel domain self-constraint training strategy, which enables Baichuan4-Finance-Base to acquire financial knowledge without losing general capabilities.<n>We evaluate Baichuan4-Finance on many widely used general datasets and two holistic financial benchmarks.
Score: 12.097387122694432
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language models (LLMs) have demonstrated strong capabilities in language understanding, generation, and reasoning, yet their potential in finance remains underexplored due to the complexity and specialization of financial knowledge. In this work, we report the development of the Baichuan4-Finance series, including a comprehensive suite of foundational Baichuan4-Finance-Base and an aligned language model Baichuan4-Finance, which are built upon Baichuan4-Turbo base model and tailored for finance domain. Firstly, we have dedicated significant effort to building a detailed pipeline for improving data quality. Moreover, in the continual pre-training phase, we propose a novel domain self-constraint training strategy, which enables Baichuan4-Finance-Base to acquire financial knowledge without losing general capabilities. After Supervised Fine-tuning and Reinforcement Learning from Human Feedback and AI Feedback, the chat model Baichuan4-Finance is able to tackle various financial certification questions and real-world scenario applications. We evaluate Baichuan4-Finance on many widely used general datasets and two holistic financial benchmarks. The evaluation results show that Baichuan4-Finance-Base surpasses almost all competitive baselines on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. At the same time, Baichuan4-Finance demonstrates even more impressive performance on financial application scenarios, showcasing its potential to foster community innovation in the financial LLM field.

Related papers

The LLM Pro Finance Suite: Multilingual Large Language Models for Financial Applications [4.211847212372977]
The LLM Pro Finance Suite is a collection of five instruction-tuned large language models (LLMs) specifically designed for financial applications.<n>Our approach focuses on enhancing generalist instruction-tuned models, leveraging their existing strengths in instruction following, reasoning, and toxicity control.<n>We evaluate the Suite on a comprehensive financial benchmark suite, demonstrating consistent improvement over state-of-the-art baselines in finance-oriented tasks and financial translation.
arXiv Detail & Related papers (2025-11-07T11:08:31Z)
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain [54.06289302468199]
FinTrust is a benchmark specifically designed for evaluating the trustworthiness of LLMs in finance applications.<n> proprietary models like o4-mini outperforms in most tasks such as safety.<n>Open-source models like DeepSeek-V3 have advantage in specific areas like industry-level fairness.
arXiv Detail & Related papers (2025-10-17T01:45:49Z)
Finance Language Model Evaluation (FLaME) [5.904572835181286]
Language Models (LMs) have demonstrated impressive capabilities with core Natural Language Processing (NLP) tasks.<n>We present the first holistic benchmarking suite for Financial Language Model Evaluation (FLaME)<n>We are the first research paper to comprehensively study LMs against'reasoning-reinforced' LMs.
arXiv Detail & Related papers (2025-06-18T19:54:33Z)
FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs [15.230256296815565]
FinMaster is a benchmark designed to assess the capabilities of large language models (LLMs) in financial literacy, accounting, auditing, and consulting.<n>FinMaster comprises three main modules: FinSim, FinSuite, and FinEval.<n>Experiments reveal critical capability gaps in financial reasoning, with accuracy dropping from over 90% on basic tasks to merely 37% on complex scenarios.
arXiv Detail & Related papers (2025-05-18T11:47:55Z)
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance [32.516564836540745]
Large language models (LLMs) have shown strong general reasoning capabilities, but their effectiveness in financial reasoning remains underexplored. We evaluate 24 state-of-the-art general and reasoning-focused LLMs across four complex financial reasoning tasks. We propose two domain-adapted models, Fino1-8B and FinoB, trained with chain-of-thought (CoT) fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-02-12T05:13:04Z)
Demystifying Domain-adaptive Post-training for Financial LLMs [79.581577578952]
FINDAP is a systematic and fine-grained investigation into domain adaptive post-training of large language models (LLMs) Our approach consists of four key components: FinCap, FinRec, FinTrain and FinEval. The resulting model, Llama-Fin, achieves state-of-the-art performance across a wide range of financial tasks.
arXiv Detail & Related papers (2025-01-09T04:26:15Z)
FLAME: Financial Large-Language Model Assessment and Metrics Evaluation [2.6420673380196824]
We introduce FLAME, a comprehensive financial LLMs evaluation system in Chinese. FLAME-Cer covers 14 types of authoritative financial certifications, with a total of approximately 16,000 carefully selected questions. FLAME-Sce consists of 10 primary core financial business scenarios, 21 secondary financial business scenarios, and a comprehensive evaluation set of nearly 100 tertiary financial application tasks.
arXiv Detail & Related papers (2025-01-03T09:17:23Z)
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications [90.67346776473241]
Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. We introduce textitOpen-FinLLMs, a series of Financial LLMs that embed comprehensive financial knowledge into text, tables, and time-series data. We also present FinLLaVA, a multimodal LLM trained with 1.43M image-text instructions to handle complex financial data types.
arXiv Detail & Related papers (2024-08-20T16:15:28Z)
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models [61.324062412648075]
CFinBench is an evaluation benchmark for assessing the financial knowledge of large language models (LLMs) under Chinese context. It comprises 99,100 questions spanning 43 second-level categories with 3 question types: single-choice, multiple-choice and judgment. The results show that GPT4 and some Chinese-oriented models lead the benchmark, with the highest average accuracy being 60.16%.
arXiv Detail & Related papers (2024-07-02T14:34:36Z)
Financial Knowledge Large Language Model [4.599537455808687]
We introduce IDEA-FinBench, an evaluation benchmark for assessing financial knowledge in large language models (LLMs) We propose IDEA-FinKER, a framework designed to facilitate the rapid adaptation of general LLMs to the financial domain. Finally, we present IDEA-FinQA, a financial question-answering system powered by LLMs.
arXiv Detail & Related papers (2024-06-29T08:26:49Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z)
A Survey of Large Language Models in Finance (FinLLMs) [10.195778659105626]
Large Language Models (LLMs) have shown remarkable capabilities across a wide variety of Natural Language Processing (NLP) tasks. This survey provides a comprehensive overview of FinLLMs, including their history, techniques, performance, and opportunities and challenges. To support AI research in finance, we compile a collection of accessible datasets and evaluation benchmarks on GitHub.
arXiv Detail & Related papers (2024-02-04T02:06:57Z)
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models [31.961563103990432]
This paper presents FinEval, a benchmark designed to evaluate LLMs' financial domain knowledge and practical abilities.<n>The dataset contains 8,351 questions categorized into four different key areas: Financial Academic Knowledge, Financial Industry Knowledge, Financial Security Knowledge, and Financial Agent.<n>Our results show that Claude 3.5-Sonnet achieves the highest weighted average score of 72.9 across all financial domain categories under zero-shot setting.
arXiv Detail & Related papers (2023-08-19T10:38:00Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.