Related papers: RobustLR: Evaluating Robustness to Logical Perturbation in Deductive Reasoning

RobustLR: Evaluating Robustness to Logical Perturbation in Deductive Reasoning

URL: http://arxiv.org/abs/2205.12598v1
Date: Wed, 25 May 2022 09:23:50 GMT
Title: RobustLR: Evaluating Robustness to Logical Perturbation in Deductive Reasoning
Authors: Soumya Sanyal, Zeyi Liao, Xiang Ren
Abstract summary: Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in English natural language. We propose RobustLR to evaluate the robustness of these models to minimal logical edits in rulebases. We find that the models trained in prior works do not perform consistently on the different perturbations in RobustLR.
Score: 25.319674132967553
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in English natural language. While the progress is promising, it is currently unclear if these models indeed perform logical reasoning by understanding the underlying logical semantics in the language. To this end, we propose RobustLR, a suite of evaluation datasets that evaluate the robustness of these models to minimal logical edits in rulebases and some standard logical equivalence conditions. In our experiments with RoBERTa and T5, we find that the models trained in prior works do not perform consistently on the different perturbations in RobustLR, thus showing that the models are not robust to the proposed logical perturbations. Further, we find that the models find it especially hard to learn logical negation and disjunction operators. Overall, using our evaluation sets, we demonstrate some shortcomings of the deductive reasoning-based language models, which can eventually help towards designing better models for logical reasoning over natural language.

Related papers

JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models [51.99046112135311]
We introduce JustLogic, a synthetically generated deductive reasoning benchmark for rigorous evaluation of Large Language Models. JustLogic is highly complex, capable of generating a diverse range of linguistic patterns, vocabulary, and argument structures. Our experimental results reveal that most state-of-the-art (SOTA) LLMs perform significantly worse than the human average.
arXiv Detail & Related papers (2025-01-24T15:49:10Z)
Benchmarking Defeasible Reasoning with Large Language Models -- Initial Experiments and Future Directions [0.36868085124383626]
This paper proposes a benchmark that corresponds to various defeasible rule-based reasoning patterns. We modified an existing benchmark for defeasible logic reasoners by translating defeasible rules into text suitable for Large Language Models. We conducted preliminary experiments on nonmonotonic rule-based reasoning using ChatGPT and compared it with reasoning patterns defined by defeasible logic.
arXiv Detail & Related papers (2024-10-16T12:36:23Z)
Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars [0.6537995248511139]
We present a declarative framework with flexible context-sensitive rules binding multiple languages. We construct first-order logic problems by selecting up to 32 premises and one hypothesis. We demonstrate that using semantic constraints during generation and careful English verbalization of predicates enhances logical reasoning without hurting natural English tasks.
arXiv Detail & Related papers (2024-06-16T18:10:49Z)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks. But, can they really "reason" over the natural language? This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
Language Models can be Logical Solvers [99.40649402395725]
We introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers. LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers.
arXiv Detail & Related papers (2023-11-10T16:23:50Z)
Empower Nested Boolean Logic via Self-Supervised Curriculum Learning [67.46052028752327]
We find that any pre-trained language models even including large language models only behave like a random selector in the face of multi-nested logic. To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method textitCurriculum Logical Reasoning (textscClr)
arXiv Detail & Related papers (2023-10-09T06:54:02Z)
LogiGAN: Learning Logical Reasoning via Adversarial Pre-training [58.11043285534766]
We present LogiGAN, an unsupervised adversarial pre-training framework for improving logical reasoning abilities of language models. Inspired by the facilitation effect of reflective thinking in human learning, we simulate the learning-thinking process with an adversarial Generator-Verifier architecture. Both base and large size language models pre-trained with LogiGAN demonstrate obvious performance improvement on 12 datasets.
arXiv Detail & Related papers (2022-05-18T08:46:49Z)
FaiRR: Faithful and Robust Deductive Reasoning over Natural Language [25.319674132967553]
We frame the deductive logical reasoning task by defining three modular components: rule selection, fact selection, and knowledge composition. We observe that FaiRR is robust to novel language perturbations, and is faster at inference than previous works on existing reasoning datasets.
arXiv Detail & Related papers (2022-03-19T07:18:13Z)
Neuro-Symbolic Inductive Logic Programming with Logical Neural Networks [65.23508422635862]
We propose learning rules with the recently proposed logical neural networks (LNN) Compared to others, LNNs offer strong connection to classical Boolean logic. Our experiments on standard benchmarking tasks confirm that LNN rules are highly interpretable.
arXiv Detail & Related papers (2021-12-06T19:38:30Z)
Flexible Operations for Natural Language Deduction [32.92866195461153]
ParaPattern is a method for building models to generate logical transformations of diverse natural language inputs without direct human supervision. We use a BART-based model to generate the result of applying a particular logical operation to one or more premise statements. We evaluate our models using targeted contrast sets as well as out-of-domain sentence compositions from the QASC dataset.
arXiv Detail & Related papers (2021-04-18T11:36:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.