Related papers: APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning

APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning

URL: http://arxiv.org/abs/2212.07249v3
Date: Tue, 12 Mar 2024 13:30:16 GMT
Title: APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning
Authors: Jiashuo Sun, Hang Zhang, Chen Lin, Xiangdong Su, Yeyun Gong, Jian Guo
Abstract summary: We propose APOLLO to improve the long-form numerical reasoning framework. For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts. For the generator, we design consistency-based reinforcement learning and target program augmentation strategy.
Score: 31.252979262232124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long-form numerical reasoning in financial analysis aims to generate a reasoning program to calculate the correct answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numbers. Meanwhile, the program consistency were ignored under supervised training, resulting in lower training accuracy and diversity. To solve these problems, we proposed APOLLO to improve the long-form numerical reasoning framework. For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts. For the generator, we design consistency-based reinforcement learning and target program augmentation strategy based on the consistency of program execution results. Experimental results on the FinQA and ConvFinQA leaderboard verify the effectiveness of our proposed method, achieving the new state-of-the-art.

Related papers

Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models. We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models. Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
PEA: Enhancing LLM Performance on Computational-Reasoning Tasks [21.13926189404758]
This study introduces a formal approach to describe and solve a class of important reasoning tasks termed computational reasoning problems. The framework decomposes these problems into predicate and enumeration components, using LLMs to synthesize programs based on specified predicates, enumeration, and aggregation rules. Empirical evaluation reveals that PEA substantially enhances the performance of underlying models on benchmark computational problems, yielding an average accuracy improvement of approximately $50%$, coupled with increased efficiency.
arXiv Detail & Related papers (2025-02-16T00:27:05Z)
Bag of Tricks for Inference-time Computation of LLM Reasoning [10.366475014241407]
We investigate and benchmark diverse inference-time computation strategies across reasoning tasks of varying complexity. Our ablation studies reveal that previously overlooked strategies can significantly enhance performance. We establish a standardized benchmark for inference-time computation by systematically evaluating six representative methods across eight reasoning tasks.
arXiv Detail & Related papers (2025-02-11T02:31:11Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks [68.49251303172674]
State-of-the-art large language models (LLMs) exhibit impressive problem-solving capabilities but may struggle with complex reasoning and factual correctness. Existing methods harness the strengths of chain-of-thought and retrieval-augmented generation (RAG) to decompose a complex problem into simpler steps and apply retrieval to improve factual correctness. We introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning.
arXiv Detail & Related papers (2024-10-02T11:26:02Z)
FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark. Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z)
Zero-Shot Question Answering over Financial Documents using Large Language Models [0.18749305679160366]
We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports. We use novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language.
arXiv Detail & Related papers (2023-11-19T16:23:34Z)
Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning [21.708394374594082]
Numerical reasoning over table-and-text hybrid passages, such as financial reports, poses significant challenges. coarse-grained supervision of the whole solution program has impeded the model's ability to learn the underlying numerical reasoning process. We propose three pretraining tasks that operate at both the whole program and sub-program level.
arXiv Detail & Related papers (2023-05-12T13:44:40Z)
Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies. By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors. The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z)
NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences. The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation. In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z)
A Robustly Optimized Long Text to Math Models for Numerical Reasoning On FinQA [2.93888900363581]
FinQA challenge has been organized to strengthen the study on numerical reasoning. Our approach achieves the 1st place in FinQA challenge, with 71.93% execution accuracy and 67.03% program accuracy.
arXiv Detail & Related papers (2022-06-29T12:10:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.