EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements
- URL: http://arxiv.org/abs/2506.08762v1
- Date: Tue, 10 Jun 2025 13:03:36 GMT
- Title: EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements
- Authors: Issa Sugiura, Takashi Ishida, Taro Makino, Chieko Tazuke, Takanori Nakagawa, Kosuke Nakago, David Ha,
- Abstract summary: We introduce EDINET-Bench, an open-source Japanese financial benchmark to evaluate the performance of large language models (LLMs)<n>Our experiments reveal that even state-of-the-art LLMs struggle, performing only slightly better than logistic regression in binary classification for fraud detection and earnings forecasting.<n>Our dataset, benchmark construction code, and evaluation code is publicly available to facilitate future research in finance with LLMs.
- Score: 7.259647868714988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Financial analysis presents complex challenges that could leverage large language model (LLM) capabilities. However, the scarcity of challenging financial datasets, particularly for Japanese financial data, impedes academic innovation in financial analytics. As LLMs advance, this lack of accessible research resources increasingly hinders their development and evaluation in this specialized domain. To address this gap, we introduce EDINET-Bench, an open-source Japanese financial benchmark designed to evaluate the performance of LLMs on challenging financial tasks including accounting fraud detection, earnings forecasting, and industry prediction. EDINET-Bench is constructed by downloading annual reports from the past 10 years from Japan's Electronic Disclosure for Investors' NETwork (EDINET) and automatically assigning labels corresponding to each evaluation task. Our experiments reveal that even state-of-the-art LLMs struggle, performing only slightly better than logistic regression in binary classification for fraud detection and earnings forecasting. These results highlight significant challenges in applying LLMs to real-world financial applications and underscore the need for domain-specific adaptation. Our dataset, benchmark construction code, and evaluation code is publicly available to facilitate future research in finance with LLMs.
Related papers
- Bridging Language Models and Financial Analysis [49.361943182322385]
The rapid advancements in Large Language Models (LLMs) have unlocked transformative possibilities in natural language processing.<n>Financial data is often embedded in intricate relationships across textual content, numerical tables, and visual charts.<n>Despite the fast pace of innovation in LLM research, there remains a significant gap in their practical adoption within the finance industry.
arXiv Detail & Related papers (2025-03-14T01:35:20Z) - FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models [0.0]
FinanceQA is a testing suite that evaluates LLMs' performance on complex numerical financial analysis tasks that mirror real-world investment work.<n>Current LLMs fail to meet the strict accuracy requirements of financial institutions, with models failing approximately 60% of realistic tasks.<n>Results show that higher-quality training data is needed to support such tasks, which we experiment with using OpenAI's fine-tuning API.
arXiv Detail & Related papers (2025-01-30T00:06:55Z) - Auto-Generating Earnings Report Analysis via a Financial-Augmented LLM [1.3597551064547502]
This paper presents a novel challenge: developing an LLM specifically for automating the generation of earnings reports analysis.<n>Our methodology involves an in-depth analysis of existing earnings reports followed by a unique approach to fine-tune an LLM for this purpose.<n>With extensive financial documents, we construct financial instruction data, enabling the refined adaptation of our LLM to financial contexts.
arXiv Detail & Related papers (2024-12-11T08:09:42Z) - Financial Knowledge Large Language Model [4.599537455808687]
We introduce IDEA-FinBench, an evaluation benchmark for assessing financial knowledge in large language models (LLMs)
We propose IDEA-FinKER, a framework designed to facilitate the rapid adaptation of general LLMs to the financial domain.
Finally, we present IDEA-FinQA, a financial question-answering system powered by LLMs.
arXiv Detail & Related papers (2024-06-29T08:26:49Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data.
We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z) - FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks.
FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z) - Revolutionizing Finance with LLMs: An Overview of Applications and Insights [45.660896719456886]
Large Language Models (LLMs) like ChatGPT have seen considerable advancements and have been applied in diverse fields.<n>These models are being utilized for automating financial report generation, forecasting market trends, analyzing investor sentiment, and offering personalized financial advice.
arXiv Detail & Related papers (2024-01-22T01:06:17Z) - PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark
for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data.
We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks.
We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.