A Robustly Optimized Long Text to Math Models for Numerical Reasoning On
FinQA
- URL: http://arxiv.org/abs/2207.06490v1
- Date: Wed, 29 Jun 2022 12:10:18 GMT
- Title: A Robustly Optimized Long Text to Math Models for Numerical Reasoning On
FinQA
- Authors: Renhui Zhang, Youwei Zhang, Yao Yu
- Abstract summary: FinQA challenge has been organized to strengthen the study on numerical reasoning.
Our approach achieves the 1st place in FinQA challenge, with 71.93% execution accuracy and 67.03% program accuracy.
- Score: 2.93888900363581
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Numerical reasoning is required when solving most problems in our life, but
it has been neglected in previous artificial intelligence researches. FinQA
challenge has been organized to strengthen the study on numerical reasoning
where the participants are asked to predict the numerical reasoning program to
solve financial question. The result of FinQA will be evaluated by both
execution accuracy and program accuracy. In this paper, we present our approach
to tackle the task objective by developing models with different specialized
capabilities and fusing their strength. Overall, our approach achieves the 1st
place in FinQA challenge, with 71.93% execution accuracy and 67.03% program
accuracy.
Related papers
- Evaluation of OpenAI o1: Opportunities and Challenges of AGI [112.0812059747033]
o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance.
The model excelled in tasks requiring intricate reasoning and knowledge integration across various fields.
Overall results indicate significant progress towards artificial general intelligence.
arXiv Detail & Related papers (2024-09-27T06:57:00Z) - SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories.
Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development.
We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z) - Case-Based Reasoning Approach for Solving Financial Question Answering [5.10832476049103]
FinQA introduced a numerical reasoning dataset for financial documents.
We propose a novel approach to tackle numerical reasoning problems using case based reasoning (CBR)
Our model retrieves relevant cases to address a given question, and then generates an answer based on the retrieved cases and contextual information.
arXiv Detail & Related papers (2024-05-18T10:06:55Z) - Evaluating Mathematical Reasoning Beyond Accuracy [50.09931172314218]
We introduce ReasonEval, a new methodology for evaluating the quality of reasoning steps.
We show that ReasonEval achieves state-of-the-art performance on human-labeled datasets.
We observe that ReasonEval can play a significant role in data selection.
arXiv Detail & Related papers (2024-04-08T17:18:04Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - Comprehensive Solution Program Centric Pretraining for Table-and-Text
Hybrid Numerical Reasoning [21.708394374594082]
Numerical reasoning over table-and-text hybrid passages, such as financial reports, poses significant challenges.
coarse-grained supervision of the whole solution program has impeded the model's ability to learn the underlying numerical reasoning process.
We propose three pretraining tasks that operate at both the whole program and sub-program level.
arXiv Detail & Related papers (2023-05-12T13:44:40Z) - APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning [31.252979262232124]
We propose APOLLO to improve the long-form numerical reasoning framework.
For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts.
For the generator, we design consistency-based reinforcement learning and target program augmentation strategy.
arXiv Detail & Related papers (2022-12-14T14:34:15Z) - A Novel DeBERTa-based Model for Financial Question Answering Task [9.083539882647928]
This research mainly focuses on the financial numerical reasoning dataset - FinQA.
In the shared task, the objective is to generate the reasoning program and the final answer according to the given financial report.
We obtain an execution accuracy of 68.99 and a program accuracy of 64.53, ranking No. 4 in the 2022 FinQA Challenge.
arXiv Detail & Related papers (2022-07-12T22:34:39Z) - FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents.
We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts.
The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z) - Logic-Guided Data Augmentation and Regularization for Consistent
Question Answering [55.05667583529711]
This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions.
Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
arXiv Detail & Related papers (2020-04-21T17:03:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.