Reflection of Thought: Inversely Eliciting Numerical Reasoning in
Language Models via Solving Linear Systems
- URL: http://arxiv.org/abs/2210.05075v1
- Date: Tue, 11 Oct 2022 00:57:19 GMT
- Title: Reflection of Thought: Inversely Eliciting Numerical Reasoning in
Language Models via Solving Linear Systems
- Authors: Fan Zhou, Haoyu Dong, Qian Liu, Zhoujun Cheng, Shi Han, Dongmei Zhang
- Abstract summary: We propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models.
We first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models.
We transform and formulate the task as an analytically solvable linear system.
- Score: 42.782260686177395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Numerical reasoning over natural language has been a long-standing goal for
the research community. However, cutting-edge language models have proven
difficult to reliably generalize to a broad range of numbers, although they
have shown proficiency in reasoning over common and simple numbers. In this
paper, we propose a novel method to elicit and exploit the numerical reasoning
knowledge hidden in pre-trained language models using simple anchor numbers.
Concretely, we first leverage simple numbers as anchors to probe the implicitly
inferred arithmetic expressions from language models, and then explicitly apply
the expressions on complex numbers to get corresponding answers. To inversely
elicit arithmetic expressions, we transform and formulate the task as an
analytically solvable linear system. Experimental results on several numerical
reasoning benchmarks demonstrate that our approach significantly improves
numerical reasoning capabilities of existing LMs. More importantly, our
approach is training-free and simply works in the inference phase, making it
highly portable and achieving consistent performance benefits across a variety
of language models (GPT-3, T5, BART, etc) in all zero-shot, few-shot, and
fine-tuning scenarios.
Related papers
- modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models [23.105555180223487]
modeLing is a novel benchmark of Linguistics Olympiad-style puzzles which tests few-shot reasoning in AI systems.
We evaluate several large open source language models and GPT on our benchmark.
arXiv Detail & Related papers (2024-06-24T18:00:59Z) - Zero-Shot Question Answering over Financial Documents using Large
Language Models [0.18749305679160366]
We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports.
We use novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language.
arXiv Detail & Related papers (2023-11-19T16:23:34Z) - Exploring the Numerical Reasoning Capabilities of Language Models: A
Comprehensive Analysis on Tabular Data [10.124148115680315]
We propose a hierarchical taxonomy for numerical reasoning skills with more than ten reasoning types across four levels.
We conduct a comprehensive evaluation of state-of-the-art models to identify reasoning challenges specific to them.
Our results show that no model consistently excels across all numerical reasoning types.
arXiv Detail & Related papers (2023-11-03T20:05:30Z) - TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative
Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas.
We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system.
We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z) - FERMAT: An Alternative to Accuracy for Numerical Reasoning [11.893004722079557]
numerical reasoning is measured using a single score on existing datasets.
We introduce a multi-view evaluation set for numerical reasoning in English, called FERMAT.
FerMAT evaluates models on various key numerical reasoning aspects such as number understanding, mathematical operations, and training dependency.
arXiv Detail & Related papers (2023-05-27T15:00:45Z) - Learning to Reason With Relational Abstractions [65.89553417442049]
We study how to build stronger reasoning capability in language models using the idea of relational abstractions.
We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy.
arXiv Detail & Related papers (2022-10-06T00:27:50Z) - Complexity-Based Prompting for Multi-Step Reasoning [72.0057198610614]
We study the task of prompting large-scale language models to perform multi-step reasoning.
A central question is which reasoning examples make the most effective prompts.
We propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.
arXiv Detail & Related papers (2022-10-03T05:33:27Z) - NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning
Tasks [37.730939229638224]
We propose NumGLUE, a benchmark that evaluates the performance of AI systems on eight different tasks.
We show that this benchmark is far from being solved with neural models including state-of-the-art large-scale language models.
We hope that NumGLUE will encourage systems that perform robust and general arithmetic reasoning within language.
arXiv Detail & Related papers (2022-04-12T09:36:10Z) - Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought.
Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.