Related papers: Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation

Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation

URL: http://arxiv.org/abs/2207.14000v4
Date: Thu, 17 Apr 2025 11:11:51 GMT
Title: Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation
Authors: Qiming Bao, Alex Yuxuan Peng, Tim Hartill, Neset Tan, Zhenyun Deng, Michael Witbrock, Jiamou Liu,
Abstract summary: We introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language.<n>In our model, reasoning is performed using an iterative memory neural network based on RNN with a gated attention mechanism.
Score: 13.887376297334258
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language. In our model, reasoning is performed using an iterative memory neural network based on RNN with a gated attention mechanism. We evaluate IMA-GloVe-GA on three datasets: PARARULES, CONCEPTRULES V1 and CONCEPTRULES V2. Experimental results show DeepLogic with gated attention can achieve higher test accuracy than DeepLogic and other RNN baseline models. Our model achieves better out-of-distribution generalisation than RoBERTa-Large when the rules have been shuffled. Furthermore, to address the issue of unbalanced distribution of reasoning depths in the current multi-step reasoning datasets, we develop PARARULE-Plus, a large dataset with more examples that require deeper reasoning steps. Experimental results show that the addition of PARARULE-Plus can increase the model's performance on examples requiring deeper reasoning depths. The source code and data are available at https://github.com/Strong-AI-Lab/Multi-Step-Deductive-Reasoning-Over-Natural-Language.

Related papers

Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons [11.429641860623143]
We evaluate and compare the reasoning capabilities of three cutting-edge Large Language Models (LLMs)<n>DeepSeek-R1 consistently achieves the highest F1-scores across multiple tasks and problem sizes.<n>A detailed analysis of DeepSeek-R1's long Chain-of-Thought responses uncovers its unique planning and verification strategies.
arXiv Detail & Related papers (2025-06-29T07:37:49Z)
TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression [55.37723860832064]
We propose a dynamic ratio-based training pipeline that does not rely on sophisticated data annotations.<n>We validate our approach across models on DeepSeek-R1-Distill-7B and DeepSeek-R1-Distill-14B and on a diverse set of benchmarks with varying difficulty levels.
arXiv Detail & Related papers (2025-06-03T09:23:41Z)
Interleaved Reasoning for Large Language Models via Reinforcement Learning [22.403928213802036]
Long chain-of-thought (CoT) enhances large language models' (LLM) reasoning capabilities.<n>We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions.
arXiv Detail & Related papers (2025-05-26T07:58:17Z)
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark designed to evaluate post-training methods for MLLMs in video understanding. It includes intricate real-world videos and complex everyday planning tasks in the format of multiple-choice questions. Using Qwen2-VL-Instruct-7B as a base model, we compare RL with supervised fine-tuning (SFT) Our detailed analysis reveals that RL enhances visual perception but often produces less coherent reasoning chains.
arXiv Detail & Related papers (2025-03-31T17:55:23Z)
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z)
Interpretable Language Modeling via Induction-head Ngram Models [74.26720927767398]
We propose Induction-head ngram models (Induction-Gram) to bolster modern ngram models with a hand-engineered "induction head" This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions. Experiments show that this simple method significantly improves next-word prediction over baseline interpretable models.
arXiv Detail & Related papers (2024-10-31T12:33:26Z)
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains [97.25943550933829]
We present P-FOLIO, a human-annotated dataset consisting of diverse and complex reasoning chains. We use P-FOLIO to evaluate and improve large-language-model (LLM) reasoning capabilities.
arXiv Detail & Related papers (2024-10-11T19:22:57Z)
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data [53.433309883370974]
This work explores the potential and limitations of using graph-based synthetic reasoning data as training signals to enhance Large Language Models' reasoning capabilities. Our experiments, conducted on two established natural language reasoning tasks, demonstrate that supervised fine-tuning with synthetic graph-based reasoning data effectively enhances LLMs' reasoning performance without compromising their effectiveness on other standard evaluation benchmarks.
arXiv Detail & Related papers (2024-09-19T03:39:09Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)
DeepSeer: Interactive RNN Explanation and Debugging via State Abstraction [10.110976560799612]
Recurrent Neural Networks (RNNs) have been widely used in Natural Language Processing (NLP) tasks. DeepSeer is an interactive system that provides both global and local explanations of RNN behavior.
arXiv Detail & Related papers (2023-03-02T21:08:17Z)
CARE: Certifiably Robust Learning with Reasoning via Variational Inference [26.210129662748862]
We propose a certifiably robust learning with reasoning pipeline (CARE) CARE achieves significantly higher certified robustness compared with the state-of-the-art baselines. We additionally conducted different ablation studies to demonstrate the empirical robustness of CARE and the effectiveness of different knowledge integration.
arXiv Detail & Related papers (2022-09-12T07:15:52Z)
Neuro-Symbolic Inductive Logic Programming with Logical Neural Networks [65.23508422635862]
We propose learning rules with the recently proposed logical neural networks (LNN) Compared to others, LNNs offer strong connection to classical Boolean logic. Our experiments on standard benchmarking tasks confirm that LNN rules are highly interpretable.
arXiv Detail & Related papers (2021-12-06T19:38:30Z)
Software for Dataset-wide XAI: From Local Explanations to Global Insights with Zennit, CoRelAy, and ViRelAy [14.513962521609233]
We introduce Zennit, CoRelAy, and ViRelAy to explore model reasoning using attribution approaches and beyond. Zennit is a highly customizable and intuitive attribution framework implementing LRP and related approaches in PyTorch. CoRelAy is a framework to easily and quickly construct quantitative analysis pipelines for dataset-wide analyses of explanations. ViRelAy is a web-application to interactively explore data, attributions, and analysis results.
arXiv Detail & Related papers (2021-06-24T17:27:22Z)
Neural Logic Reasoning [47.622957656745356]
We propose Logic-Integrated Neural Network (LINN) to integrate the power of deep learning and logic reasoning. LINN learns basic logical operations such as AND, OR, NOT as neural modules, and conducts propositional logical reasoning through the network for inference. Experiments show that LINN significantly outperforms state-of-the-art recommendation models in Top-K recommendation.
arXiv Detail & Related papers (2020-08-20T14:53:23Z)
Relational Neural Machines [19.569025323453257]
This paper presents a novel framework allowing jointly train the parameters of the learners and of a First-Order Logic based reasoner. A Neural Machine is able recover both classical learning results in case of pure sub-symbolic learning, and Markov Logic Networks. Proper algorithmic solutions are devised to make learning and inference tractable in large-scale problems.
arXiv Detail & Related papers (2020-02-06T10:53:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.