APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning
- URL: http://arxiv.org/abs/2212.07249v3
- Date: Tue, 12 Mar 2024 13:30:16 GMT
- Title: APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning
- Authors: Jiashuo Sun, Hang Zhang, Chen Lin, Xiangdong Su, Yeyun Gong, Jian Guo
- Abstract summary: We propose APOLLO to improve the long-form numerical reasoning framework.
For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts.
For the generator, we design consistency-based reinforcement learning and target program augmentation strategy.
- Score: 31.252979262232124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-form numerical reasoning in financial analysis aims to generate a
reasoning program to calculate the correct answer for a given question.
Previous work followed a retriever-generator framework, where the retriever
selects key facts from a long-form document, and the generator generates a
reasoning program based on retrieved facts. However, they treated all facts
equally without considering the different contributions of facts with and
without numbers. Meanwhile, the program consistency were ignored under
supervised training, resulting in lower training accuracy and diversity. To
solve these problems, we proposed APOLLO to improve the long-form numerical
reasoning framework. For the retriever, we adopt a number-aware negative
sampling strategy to enable the retriever to be more discriminative on key
numerical facts. For the generator, we design consistency-based reinforcement
learning and target program augmentation strategy based on the consistency of
program execution results. Experimental results on the FinQA and ConvFinQA
leaderboard verify the effectiveness of our proposed method, achieving the new
state-of-the-art.
Related papers
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks [68.49251303172674]
State-of-the-art large language models (LLMs) exhibit impressive problem-solving capabilities but may struggle with complex reasoning and factual correctness.
Existing methods harness the strengths of chain-of-thought and retrieval-augmented generation (RAG) to decompose a complex problem into simpler steps and apply retrieval to improve factual correctness.
We introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning.
arXiv Detail & Related papers (2024-10-02T11:26:02Z) - Zero-Shot Question Answering over Financial Documents using Large
Language Models [0.18749305679160366]
We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports.
We use novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language.
arXiv Detail & Related papers (2023-11-19T16:23:34Z) - Comprehensive Solution Program Centric Pretraining for Table-and-Text
Hybrid Numerical Reasoning [21.708394374594082]
Numerical reasoning over table-and-text hybrid passages, such as financial reports, poses significant challenges.
coarse-grained supervision of the whole solution program has impeded the model's ability to learn the underlying numerical reasoning process.
We propose three pretraining tasks that operate at both the whole program and sub-program level.
arXiv Detail & Related papers (2023-05-12T13:44:40Z) - Hierarchical Programmatic Reinforcement Learning via Learning to Compose
Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies.
By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors.
The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z) - NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences.
The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation.
In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z) - A Robustly Optimized Long Text to Math Models for Numerical Reasoning On
FinQA [2.93888900363581]
FinQA challenge has been organized to strengthen the study on numerical reasoning.
Our approach achieves the 1st place in FinQA challenge, with 71.93% execution accuracy and 67.03% program accuracy.
arXiv Detail & Related papers (2022-06-29T12:10:18Z) - Enforcing Consistency in Weakly Supervised Semantic Parsing [68.2211621631765]
We explore the use of consistency between the output programs for related inputs to reduce the impact of spurious programs.
We find that a more consistent formalism leads to improved model performance even without consistency-based training.
arXiv Detail & Related papers (2021-07-13T03:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.