RobustLR: Evaluating Robustness to Logical Perturbation in Deductive
  Reasoning
        - URL: http://arxiv.org/abs/2205.12598v1
 - Date: Wed, 25 May 2022 09:23:50 GMT
 - Title: RobustLR: Evaluating Robustness to Logical Perturbation in Deductive
  Reasoning
 - Authors: Soumya Sanyal, Zeyi Liao, Xiang Ren
 - Abstract summary: Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in English natural language.
We propose RobustLR to evaluate the robustness of these models to minimal logical edits in rulebases.
We find that the models trained in prior works do not perform consistently on the different perturbations in RobustLR.
 - Score: 25.319674132967553
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   Transformers have been shown to be able to perform deductive reasoning on a
logical rulebase containing rules and statements written in English natural
language. While the progress is promising, it is currently unclear if these
models indeed perform logical reasoning by understanding the underlying logical
semantics in the language. To this end, we propose RobustLR, a suite of
evaluation datasets that evaluate the robustness of these models to minimal
logical edits in rulebases and some standard logical equivalence conditions. In
our experiments with RoBERTa and T5, we find that the models trained in prior
works do not perform consistently on the different perturbations in RobustLR,
thus showing that the models are not robust to the proposed logical
perturbations. Further, we find that the models find it especially hard to
learn logical negation and disjunction operators. Overall, using our evaluation
sets, we demonstrate some shortcomings of the deductive reasoning-based
language models, which can eventually help towards designing better models for
logical reasoning over natural language.
 
       
      
        Related papers
        - Parameterized Argumentation-based Reasoning Tasks for Benchmarking   Generative Language Models [1.249418440326334]
Generative large language models as tools in the legal domain have the potential to improve the justice system.<n>However, the reasoning behavior of current generative models is brittle and poorly understood, hence cannot be responsibly applied in the domains of law and evidence.<n>We introduce an approach for creating benchmarks that can be used to evaluate the reasoning capabilities of generative language models.
arXiv  Detail & Related papers  (2025-05-02T19:04:34Z) - JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning   in Large Language Models [51.99046112135311]
We introduce JustLogic, a synthetically generated deductive reasoning benchmark for rigorous evaluation of Large Language Models.
JustLogic is highly complex, capable of generating a diverse range of linguistic patterns, vocabulary, and argument structures.
Our experimental results reveal that most state-of-the-art (SOTA) LLMs perform significantly worse than the human average.
arXiv  Detail & Related papers  (2025-01-24T15:49:10Z) - Benchmarking Defeasible Reasoning with Large Language Models -- Initial   Experiments and Future Directions [0.36868085124383626]
This paper proposes a benchmark that corresponds to various defeasible rule-based reasoning patterns.
We modified an existing benchmark for defeasible logic reasoners by translating defeasible rules into text suitable for Large Language Models.
We conducted preliminary experiments on nonmonotonic rule-based reasoning using ChatGPT and compared it with reasoning patterns defined by defeasible logic.
arXiv  Detail & Related papers  (2024-10-16T12:36:23Z) - Towards Logically Sound Natural Language Reasoning with Logic-Enhanced   Language Model Agents [3.5083201638203154]
Logic-Enhanced Language Model Agents (LELMA) is a framework that integrates large language models with formal logic.<n>LeLMA employs autoformalization to translate reasoning into logic representations, which are then used to assess logical validity.<n>LeLMA achieves high accuracy in error detection and improves reasoning correctness via self-refinement.
arXiv  Detail & Related papers  (2024-08-28T18:25:35Z) - Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive   Declarative Grammars [0.6537995248511139]
We present a declarative framework with flexible context-sensitive rules binding multiple languages.
We construct first-order logic problems by selecting up to 32 premises and one hypothesis.
We demonstrate that using semantic constraints during generation and careful English verbalization of predicates enhances logical reasoning without hurting natural English tasks.
arXiv  Detail & Related papers  (2024-06-16T18:10:49Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability   of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv  Detail & Related papers  (2024-04-23T21:08:49Z) - LogicAsker: Evaluating and Improving the Logical Reasoning Ability of   Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs)
Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models.
We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv  Detail & Related papers  (2024-01-01T13:53:53Z) - Language Models can be Logical Solvers [99.40649402395725]
We introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers.
LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers.
arXiv  Detail & Related papers  (2023-11-10T16:23:50Z) - Empower Nested Boolean Logic via Self-Supervised Curriculum Learning [67.46052028752327]
We find that any pre-trained language models even including large language models only behave like a random selector in the face of multi-nested logic.
To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method textitCurriculum Logical Reasoning (textscClr)
arXiv  Detail & Related papers  (2023-10-09T06:54:02Z) - LogiGAN: Learning Logical Reasoning via Adversarial Pre-training [58.11043285534766]
We present LogiGAN, an unsupervised adversarial pre-training framework for improving logical reasoning abilities of language models.
Inspired by the facilitation effect of reflective thinking in human learning, we simulate the learning-thinking process with an adversarial Generator-Verifier architecture.
Both base and large size language models pre-trained with LogiGAN demonstrate obvious performance improvement on 12 datasets.
arXiv  Detail & Related papers  (2022-05-18T08:46:49Z) - FaiRR: Faithful and Robust Deductive Reasoning over Natural Language [25.319674132967553]
We frame the deductive logical reasoning task by defining three modular components: rule selection, fact selection, and knowledge composition.
We observe that FaiRR is robust to novel language perturbations, and is faster at inference than previous works on existing reasoning datasets.
arXiv  Detail & Related papers  (2022-03-19T07:18:13Z) - Neuro-Symbolic Inductive Logic Programming with Logical Neural Networks [65.23508422635862]
We propose learning rules with the recently proposed logical neural networks (LNN)
Compared to others, LNNs offer strong connection to classical Boolean logic.
Our experiments on standard benchmarking tasks confirm that LNN rules are highly interpretable.
arXiv  Detail & Related papers  (2021-12-06T19:38:30Z) - Flexible Operations for Natural Language Deduction [32.92866195461153]
ParaPattern is a method for building models to generate logical transformations of diverse natural language inputs without direct human supervision.
We use a BART-based model to generate the result of applying a particular logical operation to one or more premise statements.
We evaluate our models using targeted contrast sets as well as out-of-domain sentence compositions from the QASC dataset.
arXiv  Detail & Related papers  (2021-04-18T11:36:26Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.