Related papers: Program of Thoughts for Financial Reasoning: Leveraging Dynamic In-Context Examples and Generative Retrieval

Program of Thoughts for Financial Reasoning: Leveraging Dynamic In-Context Examples and Generative Retrieval

URL: http://arxiv.org/abs/2510.13157v1
Date: Wed, 15 Oct 2025 05:16:54 GMT
Title: Program of Thoughts for Financial Reasoning: Leveraging Dynamic In-Context Examples and Generative Retrieval
Authors: Subhendu Khatuya, Shashwat Naidu, Pawan Goyal, Niloy Ganguly,
Abstract summary: We introduce FINDER, a novel two-step framework to enhance financial numerical reasoning.<n>The first step utilizes a generative retriever to extract relevant facts from unstructured data, including both text and tables.<n>This is followed by context-aware Program of Thought prompting with dynamic selection of in-context examples.<n>Our model FINDER achieves a new state-of-the-art performance on both the FinQA and ConvFinQA datasets.
Score: 28.84398417293526
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite continuous advancements in the capabilities of large language models (LLMs), numerical reasoning remains a challenging area. Techniques like chain-of-thought prompting, tree-of-thought prompting, and program-of-thought prompting guide LLMs through intermediate reasoning steps. Although in-context learning with few-shot prompting has improved performance, LLMs still lag behind state-of-the-art models on financial numerical reasoning datasets such as FinQA and ConvFinQA. In this work, we introduce FINDER, a novel two-step framework, to enhance LLMs' capabilities in financial numerical reasoning. The first step utilizes a generative retriever to extract relevant facts from unstructured data, including both text and tables. This is followed by context-aware Program of Thought prompting with dynamic selection of in-context examples. Our model FINDER achieves a new state-of-the-art performance on both the FinQA and ConvFinQA datasets, surpassing previous benchmarks with execution accuracy improvements of 5.98% and 4.05%, respectively.

Related papers

Structure First, Reason Next: Enhancing a Large Language Model using Knowledge Graph for Numerical Reasoning in Financial Documents [0.21485350418225244]
Large Language Models (LLMs) have shown promising results in multiple Question-Answering (Q-A) systems.<n>Structured data augmentations, such as Knowledge Graphs (KGs), have notably improved the predictions of LLMs.<n>This paper proposes a framework to incorporate structured information using KGs along with LLM predictions for numerical reasoning tasks.
arXiv Detail & Related papers (2026-01-12T17:39:08Z)
LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models [7.216206616406649]
Large language models (LLMs) like BloombergGPT and FinMA have set new benchmarks across various financial NLP tasks.<n>We propose Layer-wise Adaptive Ensemble Tuning (LAET), a novel strategy that selectively fine-tunes the most effective layers of pre-trained LLMs.<n>Our approach shows strong results in financial NLP tasks, outperforming existing benchmarks and state-of-the-art LLMs.
arXiv Detail & Related papers (2025-11-14T13:57:46Z)
Context-level Language Modeling by Learning Predictive Context Embeddings [79.00607069677393]
We introduce textbfContextLM, a framework that augments standard pretraining with an inherent textbfnext-context prediction objective.<n>This mechanism trains the model to learn predictive representations of multi-token contexts, leveraging error signals derived from future token chunks.<n>Experiments on the GPT2 and Pythia model families, scaled up to $1.5$B parameters, show that ContextLM delivers consistent improvements in both perplexity and downstream task performance.
arXiv Detail & Related papers (2025-10-23T07:09:45Z)
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering [57.43420753842626]
FinLFQA is a benchmark designed to evaluate the ability of Large Language Models to generate long-form answers to complex financial questions.<n>We provide an automatic evaluation framework covering both answer quality and attribution quality.
arXiv Detail & Related papers (2025-10-07T20:06:15Z)
FINCH: Financial Intelligence using Natural language for Contextualized SQL Handling [1.8679829796354372]
We introduce a curated financial dataset (FINCH) comprising 292 tables and 75,725 natural language-based pairs.<n>We benchmark reasoning models and language models of varying scales, providing a systematic analysis of their strengths and limitations.<n>Finally, we propose a finance-oriented evaluation metric (FINCH Score) that captures nuances overlooked by existing measures.
arXiv Detail & Related papers (2025-10-02T10:55:11Z)
MLLM-CBench:A Comprehensive Benchmark for Continual Instruction Tuning of Multimodal LLMs with Chain-of-Thought Reasoning Analysis [21.091157331212493]
Multimodal large language models (MLLMs) require continual instruction tuning during their post-training phase to adapt to the dynamic real-world demands.<n>We introduce textbfMLLM-CTBench, a dataset curating seven challenging tasks from six diverse domains with three contributions.
arXiv Detail & Related papers (2025-07-31T07:49:36Z)
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis [60.32962597618861]
IDA-Bench is a novel benchmark evaluating large language models in multi-round interactive scenarios.<n>Agent performance is judged by comparing its final numerical output to the human-derived baseline.<n>Even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on 50% of the tasks, highlighting limitations not evident in single-turn tests.
arXiv Detail & Related papers (2025-05-23T09:37:52Z)
Fino1: On the Transferability of Reasoning-Enhanced LLMs and Reinforcement Learning to Finance [35.617409883103335]
FinReason is the first financial reasoning benchmark covering multi-table analysis, long-context reasoning, and equation-based tasks.<n>We introduce FinCoT, the first open high-fidelity CoT corpus for finance, distilled from seven QA datasets.<n>We develop Fin-o1, the first open financial reasoning models trained via supervised fine-tuning and GRPO-based RL.
arXiv Detail & Related papers (2025-02-12T05:13:04Z)
SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models [6.639972934967109]
Large language models (LLMs) have become powerful tools for advancing natural language processing applications in the financial industry. We propose a novel large language model specifically designed for the Chinese financial domain, named SNFinLLM. SNFinLLM excels in domain-specific tasks such as answering questions, summarizing financial research reports, analyzing sentiment, and executing financial calculations.
arXiv Detail & Related papers (2024-08-05T08:24:24Z)
A Survey of Table Reasoning with Large Language Models [55.2326738851157]
Using Large Language Models (LLMs) has become the mainstream method for table reasoning. We analyze the mainstream techniques used to improve table reasoning performance in the LLM era. We provide research directions from both the improvement of existing methods and the expansion of practical applications.
arXiv Detail & Related papers (2024-02-13T07:17:52Z)
DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning [74.99318727786337]
We propose Multiple Experts Fine-tuning Framework to build a financial large language model (LLM) We build a financial instruction-tuning dataset named DISC-FIN-SFT, including instruction samples of four categories (consulting, NLP tasks, computing and retrieval-augmented generation) Evaluations conducted on multiple benchmarks demonstrate that our model performs better than baseline models in various financial scenarios.
arXiv Detail & Related papers (2023-10-23T11:33:41Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.