Learning To Teach Large Language Models Logical Reasoning
- URL: http://arxiv.org/abs/2310.09158v1
- Date: Fri, 13 Oct 2023 14:53:06 GMT
- Title: Learning To Teach Large Language Models Logical Reasoning
- Authors: Meiqi Chen, Yubo Ma, Kaitao Song, Yixin Cao, Yan Zhang, and Dongsheng
Li
- Abstract summary: Large language models (LLMs) have gained enormous attention from both academia and industry.
However, current LLMs still output unreliable content in practical reasoning tasks due to their inherent issues.
- Score: 33.88499005859982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have gained enormous attention from both
academia and industry, due to their exceptional ability in language generation
and extremely powerful generalization. However, current LLMs still output
unreliable content in practical reasoning tasks due to their inherent issues
(e.g., hallucination). To better disentangle this problem, in this paper, we
conduct an in-depth investigation to systematically explore the capability of
LLMs in logical reasoning. More in detail, we first investigate the deficiency
of LLMs in logical reasoning on different tasks, including event relation
extraction and deductive reasoning. Our study demonstrates that LLMs are not
good reasoners in solving tasks with rigorous reasoning and will produce
counterfactual answers, which require us to iteratively refine. Therefore, we
comprehensively explore different strategies to endow LLMs with logical
reasoning ability, and thus enable them to generate more logically consistent
answers across different scenarios. Based on our approach, we also contribute a
synthesized dataset (LLM-LR) involving multi-hop reasoning for evaluation and
pre-training. Extensive quantitative and qualitative analyses on different
tasks also validate the effectiveness and necessity of teaching LLMs with logic
and provide insights for solving practical tasks with LLMs in future work.
Related papers
- Reasoning with Large Language Models, a Survey [2.831296564800826]
This paper reviews the rapidly expanding field of prompt-based reasoning with LLMs.
Our taxonomy identifies different ways to generate, evaluate, and control multi-step reasoning.
We find that self-improvement, self-reflection, and some meta abilities of the reasoning processes are possible through the judicious use of prompts.
arXiv Detail & Related papers (2024-07-16T08:49:35Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance [38.49506722997423]
Large language models (LLMs) have developed impressive performance and strong explainability across various reasoning scenarios.
Despite this, when tasked with simple questions supported by a generic fact, LLMs often fail to provide consistent and precise answers.
This has sparked a vigorous debate about whether LLMs are genuinely reasoning or merely memorizing.
arXiv Detail & Related papers (2024-03-14T04:06:13Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by
Dissociating Language and Cognition [57.747888532651]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Do Large Language Models Understand Logic or Just Mimick Context? [14.081178100662163]
This paper investigates the reasoning capabilities of large language models (LLMs) on two logical reasoning datasets.
It is found that LLMs do not truly understand logical rules; rather, in-context learning has simply enhanced the likelihood of these models arriving at the correct answers.
arXiv Detail & Related papers (2024-02-19T12:12:35Z) - A & B == B & A: Triggering Logical Reasoning Failures in Large Language
Models [65.86149763739141]
We introduce LogicAsker, an automatic approach that comprehensively evaluates and improves the logical reasoning abilities of LLMs.
We evaluate LogicAsker on six widely deployed LLMs, including GPT-3, ChatGPT, GPT-4, Bard, Vicuna, and Guanaco.
The results show that test cases from LogicAsker can find logical reasoning failures in different LLMs with a rate of 25% - 94%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z) - Exploring Self-supervised Logic-enhanced Training for Large Language Models [59.227222647741094]
In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training.
We devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion.
The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM.
arXiv Detail & Related papers (2023-05-23T06:13:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.