Related papers: Hierarchical Retrieval with Evidence Curation for Open-Domain Financial Question Answering on Standardized Documents

Hierarchical Retrieval with Evidence Curation for Open-Domain Financial Question Answering on Standardized Documents

URL: http://arxiv.org/abs/2505.20368v2
Date: Mon, 02 Jun 2025 01:12:15 GMT
Title: Hierarchical Retrieval with Evidence Curation for Open-Domain Financial Question Answering on Standardized Documents
Authors: Jaeyoung Choe, Jihoon Kim, Woohwan Jung,
Abstract summary: standardized documents share similar formats such as repetitive boilerplate texts, and similar table structures.<n>This similarity forces traditional RAG methods to misidentify near-duplicate text, leading to duplicate retrieval that undermines accuracy and completeness.<n>We propose the Hierarchical Retrieval with Evidence Curation framework to address these issues.
Score: 17.506934704019226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-augmented generation (RAG) based large language models (LLMs) are widely used in finance for their excellent performance on knowledge-intensive tasks. However, standardized documents (e.g., SEC filing) share similar formats such as repetitive boilerplate texts, and similar table structures. This similarity forces traditional RAG methods to misidentify near-duplicate text, leading to duplicate retrieval that undermines accuracy and completeness. To address these issues, we propose the Hierarchical Retrieval with Evidence Curation (HiREC) framework. Our approach first performs hierarchical retrieval to reduce confusion among similar texts. It first retrieve related documents and then selects the most relevant passages from the documents. The evidence curation process removes irrelevant passages. When necessary, it automatically generates complementary queries to collect missing information. To evaluate our approach, we construct and release a Large-scale Open-domain Financial (LOFin) question answering benchmark that includes 145,897 SEC documents and 1,595 question-answer pairs. Our code and data are available at https://github.com/deep-over/LOFin-bench-HiREC.

Related papers

LLM-Assisted Question-Answering on Technical Documents Using Structured Data-Aware Retrieval Augmented Generation [0.432776344138537]
Large Language Models (LLMs) are capable of natural language understanding and generation.<n> Fine-tuning is one possible solution, but it is resource-intensive and must be repeated with every data update.<n>Retrieval-Augmented Generation (RAG) offers an efficient solution by allowing LLMs to access external knowledge sources.
arXiv Detail & Related papers (2025-06-29T08:22:03Z)
GeAR: Generation Augmented Retrieval [82.20696567697016]
This paper introduces a novel method, $textbfGe$neration.<n>It improves the global document-Query similarity through contrastive learning, but also integrates well-designed fusion and decoding modules.<n>When used as a retriever, GeAR does not incur any additional computational cost over bi-encoders.
arXiv Detail & Related papers (2025-01-06T05:29:00Z)
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.<n>Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding.<n>We show that incorporating explicit reasoning about the query improves retrieval performance by up to 12.2 points.
arXiv Detail & Related papers (2024-07-16T17:58:27Z)
MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion [39.24969189479343]
We propose a novel zero-shot query expansion framework utilizing large language models (LLMs) for mutual verification. Our proposed method is fully zero-shot, and extensive experiments on three public benchmark datasets are conducted to demonstrate its effectiveness.
arXiv Detail & Related papers (2023-10-29T16:04:10Z)
PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. We propose PDFTriage that enables models to retrieve the context based on either structure or content. Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z)
DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR) While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context. Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback [29.719150565643965]
This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval. ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels. Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
arXiv Detail & Related papers (2021-08-30T18:10:26Z)
Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.