Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?
- URL: http://arxiv.org/abs/2404.12728v2
- Date: Sun, 23 Jun 2024 05:18:50 GMT
- Title: Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?
- Authors: Chengwei Qin, Wenhan Xia, Tan Wang, Fangkai Jiao, Yuchen Hu, Bosheng Ding, Ruirui Chen, Shafiq Joty,
- Abstract summary: Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences.
The NLP community has also recently found that self-generating relevant examples in the context can help large language models (LLMs) better solve a given problem than hand-crafted prompts.
We show that self-generated random examples can surprisingly achieve comparable or even better performance.
- Score: 44.158548608820624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences. One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks. Coincidentally, the NLP community has also recently found that self-generating relevant examples in the context can help large language models (LLMs) better solve a given problem than hand-crafted prompts. However, it is yet not clear whether relevance is the key factor eliciting such capability, i.e., can LLMs benefit more from self-generated relevant examples than irrelevant ones? In this work, we systematically explore whether LLMs can truly perform analogical reasoning on a diverse set of reasoning tasks. With extensive experiments and analysis, we show that self-generated random examples can surprisingly achieve comparable or even better performance, e.g., 4% performance boost on GSM8K with random biological examples. We find that the accuracy of self-generated examples is the key factor and subsequently design two improved methods with significantly reduced inference costs. Overall, we aim to advance a deeper understanding of LLM analogical reasoning and hope this work stimulates further research in the design of self-generated contexts.
Related papers
- Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning [52.83539473110143]
We introduce a novel structure-oriented analysis method to help Large Language Models (LLMs) better understand a question.
To further improve the reliability in complex question-answering tasks, we propose a multi-agent reasoning system, Structure-oriented Autonomous Reasoning Agents (SARA)
Extensive experiments verify the effectiveness of the proposed reasoning system. Surprisingly, in some cases, the system even surpasses few-shot methods.
arXiv Detail & Related papers (2024-10-18T05:30:33Z) - ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement [70.09541267910974]
Post-training Large Language Models (LLMs) with explicit reasoning trajectories can enhance their reasoning abilities.
Existing self-synthesizing methods suffer from poor generalization to out-of-domain (OOD) reasoning tasks.
We propose Reasoning Generalist via Self-Improvement (ReGenesis), a method to self-synthesize reasoning paths as post-training data.
arXiv Detail & Related papers (2024-10-03T00:09:15Z) - How Likely Do LLMs with CoT Mimic Human Reasoning? [31.86489714330338]
Chain-of-thought (CoT) emerges as a promising technique to elicit reasoning capabilities from Large Language Models (LLMs)
In this paper, we diagnose the underlying mechanism by comparing the reasoning process of LLMs with humans.
Our empirical study reveals that LLMs often deviate from a causal chain, resulting in spurious correlations and potential consistency errors.
arXiv Detail & Related papers (2024-02-25T10:13:04Z) - Fine-Grained Self-Endorsement Improves Factuality and Reasoning [72.83651220132495]
This work studies improving large language model (LLM) generations at inference time by mitigating fact-conflicting hallucinations.
We propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses.
arXiv Detail & Related papers (2024-02-23T22:24:40Z) - AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies [19.613777134600408]
Analogical thinking allows humans to solve problems in creative ways.
Can language models (LMs) do the same?
benchmarking approach focuses on aspects of this ability that are common among humans.
arXiv Detail & Related papers (2024-02-19T18:56:44Z) - I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses [23.053791342294268]
fine-tuning a large language model (LLM) with responses generated by a LLM often yields better results than using responses generated by humans, particularly in reasoning tasks.
Training with LLM-generated responses not only enhances performance but also helps maintain the model's capabilities in other reasoning tasks after fine-tuning on a specific task.
arXiv Detail & Related papers (2024-02-17T05:05:31Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans.
RaR serves as a simple yet effective prompting method for improving performance.
We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z) - Are LLMs the Master of All Trades? : Exploring Domain-Agnostic Reasoning
Skills of LLMs [0.0]
This study aims to investigate the performance of large language models (LLMs) on different reasoning tasks.
My findings indicate that LLMs excel at analogical and moral reasoning, yet struggle to perform as proficiently on spatial reasoning tasks.
arXiv Detail & Related papers (2023-03-22T22:53:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.