Measuring Systematic Generalization in Neural Proof Generation with
Transformers
- URL: http://arxiv.org/abs/2009.14786v2
- Date: Tue, 20 Oct 2020 20:31:11 GMT
- Title: Measuring Systematic Generalization in Neural Proof Generation with
Transformers
- Authors: Nicolas Gontier and Koustuv Sinha and Siva Reddy and Christopher Pal
- Abstract summary: We investigate how well Transformer language models (TLMs) can perform logical reasoning tasks when trained on knowledge encoded in natural language.
Specifically, we perform soft theorem-proving by leveraging TLMs to generate natural language proofs.
We observe length-generalization issues when evaluated on longer-than-trained sequences.
- Score: 24.157460902865854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We are interested in understanding how well Transformer language models
(TLMs) can perform reasoning tasks when trained on knowledge encoded in the
form of natural language. We investigate their systematic generalization
abilities on a logical reasoning task in natural language, which involves
reasoning over relationships between entities grounded in first-order logical
proofs. Specifically, we perform soft theorem-proving by leveraging TLMs to
generate natural language proofs. We test the generated proofs for logical
consistency, along with the accuracy of the final inference. We observe
length-generalization issues when evaluated on longer-than-trained sequences.
However, we observe TLMs improve their generalization performance after being
exposed to longer, exhaustive proofs. In addition, we discover that TLMs are
able to generalize better using backward-chaining proofs compared to their
forward-chaining counterparts, while they find it easier to generate forward
chaining proofs. We observe that models that are not trained to generate proofs
are better at generalizing to problems based on longer proofs. This suggests
that Transformers have efficient internal reasoning strategies that are harder
to interpret. These results highlight the systematic generalization behavior of
TLMs in the context of logical reasoning, and we believe this work motivates
deeper inspection of their underlying reasoning strategies.
Related papers
- LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs [87.34281749422756]
Large language models (LLMs) have achieved impressive human-like performance across various reasoning tasks.
However, their mastery of underlying inferential rules still falls short of human capabilities.
We propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic.
arXiv Detail & Related papers (2024-02-18T03:38:51Z) - Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs [95.07757789781213]
Two lines of approaches are adopted for complex reasoning with LLMs.
One line of work prompts LLMs with various reasoning structures, while the structural outputs can be naturally regarded as intermediate reasoning steps.
The other line of work adopt LLM-free declarative solvers to do the reasoning task, rendering higher reasoning accuracy but lacking interpretability due to the black-box nature of the solvers.
We present a simple extension to the latter line of work. Specifically, we showcase that the intermediate search logs generated by Prolog interpreters can be accessed and interpreted into human-readable reasoning.
arXiv Detail & Related papers (2023-11-16T11:26:21Z) - Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof
Generation with Contrastive Stepwise Decoding [11.385103498440932]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction.
Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z) - BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs'
Generation [60.77990074569754]
We present a computation-efficient framework that steers a frozen Pre-Trained Language Model towards more commonsensical generation.
Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score.
We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head.
arXiv Detail & Related papers (2023-10-25T23:32:12Z) - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z) - QA-NatVer: Question Answering for Natural Logic-based Fact Verification [11.002475880349452]
We propose to use question answering to predict natural logic operators.
In a few-shot setting on FEVER, our approach outperforms the best baseline by $4.3$ accuracy points.
A human evaluation indicates that our approach produces more plausible with fewer erroneous natural logic operators than previous natural logic-based systems.
arXiv Detail & Related papers (2023-10-22T06:27:31Z) - LAMBADA: Backward Chaining for Automated Reasoning in Natural Language [11.096348678079574]
Backward Chaining algorithm, called LAMBADA, decomposes reasoning into four sub-modules.
We show that LAMBADA achieves sizable accuracy boosts over state-of-the-art forward reasoning methods.
arXiv Detail & Related papers (2022-12-20T18:06:03Z) - Language Models Are Greedy Reasoners: A Systematic Formal Analysis of
Chain-of-Thought [10.524051272257614]
Large language models (LLMs) have shown remarkable reasoning capabilities given chain-of-thought prompts.
We present a new synthetic question-answering dataset called PrOntoQA, where each example is generated as a synthetic world model.
This allows us to parse the generated chain-of-thought into symbolic proofs for formal analysis.
arXiv Detail & Related papers (2022-10-03T21:34:32Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.