Related papers: FinTextQA: A Dataset for Long-form Financial Question Answering

FinTextQA: A Dataset for Long-form Financial Question Answering

URL: http://arxiv.org/abs/2405.09980v1
Date: Thu, 16 May 2024 10:53:31 GMT
Title: FinTextQA: A Dataset for Long-form Financial Question Answering
Authors: Jian Chen, Peilin Zhou, Yining Hua, Yingxin Loh, Kehui Chen, Ziyuan Li, Bing Zhu, Junwei Liang,
Abstract summary: FinTextQA is a novel dataset for long-form question answering (LFQA) in finance. The most effective system configuration on our dataset involved setting the embedder, retriever, reranker, and generator as Ada2, Automated Merged Retrieval, Bge-Reranker-Base, and Baichuan2-7B.
Score: 10.1084081290893
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Accurate evaluation of financial question answering (QA) systems necessitates a comprehensive dataset encompassing diverse question types and contexts. However, current financial QA datasets lack scope diversity and question complexity. This work introduces FinTextQA, a novel dataset for long-form question answering (LFQA) in finance. FinTextQA comprises 1,262 high-quality, source-attributed QA pairs extracted and selected from finance textbooks and government agency websites.Moreover, we developed a Retrieval-Augmented Generation (RAG)-based LFQA system, comprising an embedder, retriever, reranker, and generator. A multi-faceted evaluation approach, including human ranking, automatic metrics, and GPT-4 scoring, was employed to benchmark the performance of different LFQA system configurations under heightened noisy conditions. The results indicate that: (1) Among all compared generators, Baichuan2-7B competes closely with GPT-3.5-turbo in accuracy score; (2) The most effective system configuration on our dataset involved setting the embedder, retriever, reranker, and generator as Ada2, Automated Merged Retrieval, Bge-Reranker-Base, and Baichuan2-7B, respectively; (3) models are less susceptible to noise after the length of contexts reaching a specific threshold.

Related papers

The benefits of query-based KGQA systems for complex and temporal questions in LLM era [55.20230501807337]
Large language models excel in question-answering (QA) yet still struggle with multi-hop reasoning and temporal questions.<n> Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers.<n>We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks.
arXiv Detail & Related papers (2025-07-16T06:41:03Z)
Automatic Dataset Generation for Knowledge Intensive Question Answering Tasks [10.562940259841623]
This paper presents a novel approach for enhancing Large Language Models (LLMs) in knowledge-intensive QA tasks.<n>The proposed system includes an automated QA generator and a model fine-tuner, evaluated using perplexity, ROUGE, BLEU, and BERTScore.<n>Experiments demonstrate improvements in logical coherence and factual accuracy, with implications for developing adaptable Artificial Intelligence (AI) systems.
arXiv Detail & Related papers (2025-05-20T11:16:29Z)
FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models [0.0]
We propose a novel financial QA system using the transformer-based pre-trained BERT language model.<n>Our system focuses on financial non-factoid answer selection, which retrieves a set of passage-level texts and selects the most relevant as the answer.
arXiv Detail & Related papers (2025-04-24T15:25:52Z)
FinSage: A Multi-aspect RAG System for Financial Filings Question Answering [7.581619443736712]
FinSage is a multi-modal pre-processing pipeline that unifies diverse data formats and generates metadata summaries. Experiments demonstrate that FinSage achieves an impressive recall of 92.51% on 75 expert-curated questions. FinSage has been successfully deployed as financial question-answering agent in online meetings, where it has already served more than 1,200 people.
arXiv Detail & Related papers (2025-04-20T04:58:14Z)
SEC-QA: A Systematic Evaluation Corpus for Financial QA [12.279234447220155]
Existing datasets are often constrained by size, context, or relevance to practical applications. We propose SEC-QA, a continuous dataset generation framework with two key features. We introduce a QA system based on program-of-thought that improves the ability to perform complex information retrieval and quantitative reasoning pipelines.
arXiv Detail & Related papers (2024-06-20T15:12:41Z)
SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation) We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z)
QASnowball: An Iterative Bootstrapping Framework for High-Quality Question-Answering Data Generation [67.27999343730224]
We introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball) QASnowball can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples. We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models.
arXiv Detail & Related papers (2023-09-19T05:20:36Z)
PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance [96.06505049126345]
We present a new dataset, named PACIFIC. Compared with existing CQA datasets, PACIFIC exhibits three key features: (i) proactivity, (ii) numerical reasoning, and (iii) hybrid context of tables and text. A new task is defined accordingly to study Proactive Conversational Question Answering (PCQA), which combines clarification question generation and CQA. UniPCQA performs multi-task learning over all sub-tasks in PCQA and incorporates a simple ensemble strategy to alleviate the error propagation issue in the multi-task learning by cross-validating top-$k$ sampled Seq2Seq
arXiv Detail & Related papers (2022-10-17T08:06:56Z)
Improving Question Answering with Generation of NQ-like Questions [12.276281998447079]
Question Answering (QA) systems require a large amount of annotated data which is costly and time-consuming to gather. We propose an algorithm to automatically generate shorter questions resembling day-to-day human communication in the Natural Questions (NQ) dataset from longer trivia questions in Quizbowl (QB) dataset.
arXiv Detail & Related papers (2022-10-12T21:36:20Z)
Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records [8.272573489245717]
We design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner. For a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question.
arXiv Detail & Related papers (2022-03-14T08:12:16Z)
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance [71.76018597965378]
We build a new large-scale Question Answering dataset containing both Tabular And Textual data, named TAT-QA. We propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text.
arXiv Detail & Related papers (2021-05-17T06:12:06Z)
Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts. Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z)
Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data. We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.