Related papers: SEC-QA: A Systematic Evaluation Corpus for Financial QA

SEC-QA: A Systematic Evaluation Corpus for Financial QA

URL: http://arxiv.org/abs/2406.14394v1
Date: Thu, 20 Jun 2024 15:12:41 GMT
Title: SEC-QA: A Systematic Evaluation Corpus for Financial QA
Authors: Viet Dac Lai, Michael Krumdick, Charles Lovering, Varshini Reddy, Craig Schmidt, Chris Tanner,
Abstract summary: Existing datasets are often constrained by size, context, or relevance to practical applications. We propose SEC-QA, a continuous dataset generation framework with two key features. We introduce a QA system based on program-of-thought that improves the ability to perform complex information retrieval and quantitative reasoning pipelines.
Score: 12.279234447220155
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The financial domain frequently deals with large numbers of long documents that are essential for daily operations. Significant effort is put towards automating financial data analysis. However, a persistent challenge, not limited to the finance domain, is the scarcity of datasets that accurately reflect real-world tasks for model evaluation. Existing datasets are often constrained by size, context, or relevance to practical applications. Moreover, LLMs are currently trained on trillions of tokens of text, limiting access to novel data or documents that models have not encountered during training for unbiased evaluation. We propose SEC-QA, a continuous dataset generation framework with two key features: 1) the semi-automatic generation of Question-Answer (QA) pairs spanning multiple long context financial documents, which better represent real-world financial scenarios; 2) the ability to continually refresh the dataset using the most recent public document collections, not yet ingested by LLMs. Our experiments show that current retrieval augmented generation methods systematically fail to answer these challenging multi-document questions. In response, we introduce a QA system based on program-of-thought that improves the ability to perform complex information retrieval and quantitative reasoning pipelines, thereby increasing QA accuracy.

Related papers

Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering [31.334698403426245]
Retrieval-Augmented Generation (RAG) is becoming increasingly essential for Question Answering (QA) in the financial sector.<n>We present VeritasFi, an innovative hybrid RAG framework that incorporates a multi-modal preprocessing pipeline.<n>By integrating our proposed designs, VeritasFi presents a groundbreaking framework that greatly enhances the adaptability and robustness of financial RAG systems.
arXiv Detail & Related papers (2025-10-12T22:45:24Z)
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering [57.43420753842626]
FinLFQA is a benchmark designed to evaluate the ability of Large Language Models to generate long-form answers to complex financial questions.<n>We provide an automatic evaluation framework covering both answer quality and attribution quality.
arXiv Detail & Related papers (2025-10-07T20:06:15Z)
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering [57.18367828883773]
FinAgentBench is a benchmark for evaluating agentic retrieval with multi-step reasoning in finance.<n>The benchmark consists of 26K expert-annotated examples on S&P-500 listed firms.<n>We evaluate a suite of state-of-the-art models and demonstrate how targeted fine-tuning can significantly improve agentic retrieval performance.
arXiv Detail & Related papers (2025-08-07T22:15:22Z)
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation [89.73542209537148]
MultiFinBen is the first multilingual and multimodal benchmark tailored to the global financial domain.<n>We introduce two novel tasks, including EnglishOCR and SpanishOCR, the first OCR-embedded financial QA tasks.<n>We propose a dynamic, difficulty-aware selection mechanism and curate a compact, balanced benchmark.
arXiv Detail & Related papers (2025-06-16T22:01:49Z)
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation [63.55583665003167]
We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance. FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets. By challenging models to retrieve relevant information from large corpora, FinDER offers a more realistic benchmark for evaluating RAG systems.
arXiv Detail & Related papers (2025-04-22T11:30:13Z)
Multi-Reranker: Maximizing performance of retrieval-augmented generation in the FinanceRAG challenge [5.279257531335345]
This paper details the development of a high-performance, finance-specific Retrieval-Augmented Generation (RAG) system for the ACM-ICAIF '24 FinanceRAG competition. We optimized performance through ablation studies on query expansion and corpus refinement during the pre-retrieval phase. Notably, we introduced an efficient method for managing long context sizes during the generation phase, significantly improving response quality without sacrificing performance.
arXiv Detail & Related papers (2024-11-23T09:56:21Z)
Quantifying Qualitative Insights: Leveraging LLMs to Market Predict [0.0]
This study addresses challenges by leveraging daily reports from securities firms to create high-quality contextual information. The reports are segmented into text-based key factors and combined with numerical data, such as price information, to form context sets. A crafted prompt is designed to assign scores to the key factors, converting qualitative insights into quantitative results.
arXiv Detail & Related papers (2024-11-13T07:45:40Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
FinTextQA: A Dataset for Long-form Financial Question Answering [10.1084081290893]
FinTextQA is a novel dataset for long-form question answering (LFQA) in finance. The most effective system configuration on our dataset involved setting the embedder, retriever, reranker, and generator as Ada2, Automated Merged Retrieval, Bge-Reranker-Base, and Baichuan2-7B.
arXiv Detail & Related papers (2024-05-16T10:53:31Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
Automatic Question-Answer Generation for Long-Tail Knowledge [65.11554185687258]
We propose an automatic approach to generate specialized QA datasets for tail entities. We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets.
arXiv Detail & Related papers (2024-03-03T03:06:31Z)
Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language. We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs. We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.