FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning
- URL: http://arxiv.org/abs/2402.12692v4
- Date: Tue, 08 Oct 2024 07:18:04 GMT
- Title: FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning
- Authors: Xiao Li, Bolin Zhu, Sichen Liu, Yin Zhu, Yiwei Liu, Gong Cheng,
- Abstract summary: We construct a dataset for formula-based numerical reasoning called FormulaReasoning, which consists of 5,420 reasoning-based questions.
We employ it to conduct evaluations of LLMs with size ranging from 7B to over 100B parameters utilizing zero-shot and few-shot chain-of-thought methods.
We also explore using retrieval-augmented LLMs provided with an external formula database associated with our dataset.
- Score: 14.0148122484585
- License:
- Abstract: The application of formulas is a fundamental ability of humans when addressing numerical reasoning problems. However, existing numerical reasoning datasets seldom indicate explicitly the formulas employed during the reasoning steps. To bridge this gap, we construct a dataset for formula-based numerical reasoning called FormulaReasoning, which consists of 5,420 reasoning-based questions. We employ it to conduct evaluations of LLMs with size ranging from 7B to over 100B parameters utilizing zero-shot and few-shot chain-of-thought methods, and we further explore using retrieval-augmented LLMs provided with an external formula database associated with our dataset. We also experiment with supervised methods where we divide the reasoning process into formula generation, parameter extraction, and numerical calculation, and perform data augmentation. Our empirical findings underscore the significant potential for improvement in existing models when applied to our challenging, formula-driven FormulaReasoning.
Related papers
- ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning [54.70811660561151]
Existing math datasets evaluate the reasoning abilities of large language models (LLMs) by either using the final answer or the intermediate reasoning steps derived from static examples.
We seek to use symbolic programs as a means for automated evaluation if a model can consistently produce correct final answers across various inputs to the program.
We observe significant accuracy drops using our proposed evaluation compared with original static examples, suggesting the fragility of math reasoning in state-of-the-art LLMs.
arXiv Detail & Related papers (2024-10-24T18:02:37Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Enhancing Numerical Reasoning with the Guidance of Reliable Reasoning
Processes [55.2326738851157]
We introduce Enhancing NumeriCal reasOning with Reliable procEsses (Encore), which derives the reliable reasoning process by decomposing the answer formula.
We present a series of pre-training tasks to help models learn the reasoning process generation with synthesized data.
Experiments show that Encore yields improvement on all five experimental datasets with an average of 1.8%.
arXiv Detail & Related papers (2024-02-16T13:02:11Z) - Engineering an Exact Pseudo-Boolean Model Counter [38.901687092266094]
We propose the first exact Pseudo-Boolean model counter, PBCount, that relies on knowledge compilation approach via algebraic decision diagrams.
Our empirical evaluation shows that PBCount can compute counts for 1513 instances while the current state-of-the-art approach could only handle 1013 instances.
arXiv Detail & Related papers (2023-12-19T17:14:06Z) - TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative
Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas.
We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system.
We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z) - Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z) - Bayesian Quadrature on Riemannian Data Manifolds [79.71142807798284]
A principled way to model nonlinear geometric structure inherent in data is provided.
However, these operations are typically computationally demanding.
In particular, we focus on Bayesian quadrature (BQ) to numerically compute integrals over normal laws.
We show that by leveraging both prior knowledge and an active exploration scheme, BQ significantly reduces the number of required evaluations.
arXiv Detail & Related papers (2021-02-12T17:38:04Z) - Towards Question Format Independent Numerical Reasoning: A Set of
Prerequisite Tasks [23.72187153601608]
We introduce NUMBERGAME, a multifaceted benchmark to evaluate model performance across numerical reasoning tasks of eight diverse formats.
Two of the new types we add are about questions that require external numerical knowledge, commonsense knowledge and domain knowledge.
For building a more practical numerical reasoning system, NUMBERGAME demands four capabilities beyond numerical reasoning.
arXiv Detail & Related papers (2020-05-18T08:14:04Z) - An approximate KLD based experimental design for models with intractable
likelihoods [1.8275108630751844]
In this work we consider a special type of statistical experimental design (ED) problems where the likelihoods are not available in a closed form.
The popular information-theoretic Kullback-Leibler divergence (KLD) based design criterion can not be used directly, as it requires to evaluate the likelihood function.
To address the issue, we derive a new utility function, which is a lower bound of the original KLD utility.
arXiv Detail & Related papers (2020-04-01T21:18:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.