LLMs for Relational Reasoning: How Far are We?
- URL: http://arxiv.org/abs/2401.09042v1
- Date: Wed, 17 Jan 2024 08:22:52 GMT
- Title: LLMs for Relational Reasoning: How Far are We?
- Authors: Zhiming Li, Yushi Cao, Xiufeng Xu, Junzhe Jiang, Xu Liu, Yon Shin Teo,
Shang-wei Lin, Yang Liu
- Abstract summary: Large language models (LLMs) have revolutionized many areas by achieving state-of-the-art performance on downstream tasks.
Recent efforts have demonstrated that the LLMs are poor at solving sequential decision-making problems.
- Score: 8.840750655261251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have revolutionized many areas (e.g. natural
language processing, software engineering, etc.) by achieving state-of-the-art
performance on extensive downstream tasks. Aiming to achieve robust and general
artificial intelligence, there has been a surge of interest in investigating
the reasoning ability of the LLMs. Whereas the textual and numerical reasoning
benchmarks adopted by previous works are rather shallow and simple, it is hard
to conclude that the LLMs possess strong reasoning ability by merely achieving
positive results on these benchmarks. Recent efforts have demonstrated that the
LLMs are poor at solving sequential decision-making problems that require
common-sense planning by evaluating their performance on the reinforcement
learning benchmarks. In this work, we conduct an in-depth assessment of several
state-of-the-art LLMs' reasoning ability based on the inductive logic
programming (ILP) benchmark, which is broadly recognized as a representative
and challenging measurement for evaluating logic program induction/synthesis
systems as it requires inducing strict cause-effect logic to achieve robust
deduction on independent and identically distributed (IID) and
out-of-distribution (OOD) test samples. Our evaluations illustrate that
compared with the neural program induction systems which are much smaller in
model size, the state-of-the-art LLMs are much poorer in terms of reasoning
ability by achieving much lower performance and generalization using either
natural language prompting or truth-value matrix prompting.
Related papers
- Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data [53.433309883370974]
This work explores the potential and limitations of using graph-based synthetic reasoning data as training signals to enhance Large Language Models' reasoning capabilities.
Our experiments, conducted on two established natural language reasoning tasks, demonstrate that supervised fine-tuning with synthetic graph-based reasoning data effectively enhances LLMs' reasoning performance without compromising their effectiveness on other standard evaluation benchmarks.
arXiv Detail & Related papers (2024-09-19T03:39:09Z) - Inductive Learning of Logical Theories with LLMs: A Complexity-graded Analysis [9.865771016218549]
This work presents a novel systematic methodology to analyse the capabilities and limitations of Large Language Models (LLMs)
The analysis is complexity-graded w.r.t. rule dependency structure, allowing quantification of specific inference challenges on LLM performance.
arXiv Detail & Related papers (2024-08-15T16:41:00Z) - Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought [31.964412924094656]
Large language models (LLMs) have shown exceptional performance as general-purpose assistants.
We introduce a novel learning framework, THOUGHT-LIKE-PRO, to facilitate learning and generalization across diverse reasoning tasks.
Our empirical findings indicate that our proposed approach substantially enhances the reasoning abilities of LLMs.
arXiv Detail & Related papers (2024-07-18T18:52:10Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs)
Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models.
We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Concise and Organized Perception Facilitates Reasoning in Large Language Models [32.71672086718057]
We show that large language models (LLMs) exhibit failure patterns akin to human-like cognitive biases when dealing with disordered and irrelevant content in reasoning tasks.
We propose a novel reasoning approach named Concise and Organized Perception (COP)
COP carefully analyzes the given statements to identify the most pertinent information while eliminating redundancy efficiently.
arXiv Detail & Related papers (2023-10-05T04:47:49Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z) - Exploring Self-supervised Logic-enhanced Training for Large Language Models [59.227222647741094]
In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training.
We devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion.
The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM.
arXiv Detail & Related papers (2023-05-23T06:13:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.