Related papers: FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation

FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation

URL: http://arxiv.org/abs/2504.15800v2
Date: Wed, 23 Apr 2025 07:49:10 GMT
Title: FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation
Authors: Chanyeol Choi, Jihoon Kwon, Jaeseon Ha, Hojun Choi, Chaewoon Kim, Yongjae Lee, Jy-yong Sohn, Alejandro Lopez-Lira,
Abstract summary: We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance.<n>FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets.<n>By challenging models to retrieve relevant information from large corpora, FinDER offers a more realistic benchmark for evaluating RAG systems.
Score: 63.55583665003167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the fast-paced financial domain, accurate and up-to-date information is critical to addressing ever-evolving market conditions. Retrieving this information correctly is essential in financial Question-Answering (QA), since many language models struggle with factual accuracy in this domain. We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance. Unlike existing QA datasets that provide predefined contexts and rely on relatively clear and straightforward queries, FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets derived from real-world financial inquiries. These queries frequently include abbreviations, acronyms, and concise expressions, capturing the brevity and ambiguity common in the realistic search behavior of professionals. By challenging models to retrieve relevant information from large corpora rather than relying on readily determined contexts, FinDER offers a more realistic benchmark for evaluating RAG systems. We further present a comprehensive evaluation of multiple state-of-the-art retrieval models and Large Language Models, showcasing challenges derived from a realistic benchmark to drive future research on truthful and precise RAG in the financial domain.

Related papers

FinSage: A Multi-aspect RAG System for Financial Filings Question Answering [7.7513659534623605]
FinSage is a multi-modal pre-processing pipeline that unifies diverse data formats and generates metadata summaries.<n>Experiments demonstrate that FinSage achieves an impressive recall of 92.51% on 75 expert-curated questions.<n>FinSage has been successfully deployed as financial question-answering agent in online meetings, where it has already served more than 1,200 people.
arXiv Detail & Related papers (2025-04-20T04:58:14Z)
Towards Temporal-Aware Multi-Modal Retrieval Augmented Generation in Finance [79.78247299859656]
FinTMMBench is the first comprehensive benchmark for evaluating temporal-aware multi-modal Retrieval-Augmented Generation systems in finance.<n>Built from heterologous data of NASDAQ 100 companies, FinTMMBench offers three significant advantages.
arXiv Detail & Related papers (2025-03-07T07:13:59Z)
FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting [58.70072722290475]
Financial time series (FinTS) record the behavior of human-brain-augmented decision-making.<n>FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
arXiv Detail & Related papers (2025-02-26T05:19:16Z)
Retrieval-augmented Large Language Models for Financial Time Series Forecasting [29.769616823587594]
We introduce FinSrag, the first retrieval-augmented generation (RAG) framework with a novel domain-specific retriever FinSeer for financial time-series forecasting.<n>FinSeer leverages a candidate selection mechanism refined by LLM feedback and a similarity-driven training objective to align queries with historically influential sequences while filtering out financial noise.<n>We enrich the retrieval corpus by curating new datasets that integrate a broader set of financial indicators, capturing previously overlooked market dynamics.
arXiv Detail & Related papers (2025-02-09T12:26:05Z)
An Agent Framework for Real-Time Financial Information Searching with Large Language Models [8.260170301368758]
FinSearch is a novel agent-based search framework specifically designed for financial applications.<n>FinSearch comprises four components: (1) an LLM-based multi-step search pre-planner that decomposes user queries into structured sub-queries mapped to specific data sources through a graph representation; (2) a search executor with an LLM-based adaptive query rewriter that executes the searching of each sub-queries while dynamically refining the sub-queries in its subsequent node based on intermediate search results; and (3) a temporal weighting mechanism that prioritizes information relevance based on the time context from the user's query.
arXiv Detail & Related papers (2024-12-14T07:26:39Z)
SEC-QA: A Systematic Evaluation Corpus for Financial QA [12.279234447220155]
Existing datasets are often constrained by size, context, or relevance to practical applications. We propose SEC-QA, a continuous dataset generation framework with two key features. We introduce a QA system based on program-of-thought that improves the ability to perform complex information retrieval and quantitative reasoning pipelines.
arXiv Detail & Related papers (2024-06-20T15:12:41Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
REFinD: Relation Extraction Financial Dataset [7.207699035400335]
We propose REFinD, the first large-scale annotated dataset of relations, with $sim$29K instances and 22 relations amongst 8 types of entity pairs, generated entirely over financial documents. We observed that various state-of-the-art deep learning models struggle with numeric inference, relational and directional ambiguity.
arXiv Detail & Related papers (2023-05-22T22:40:11Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.