AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies
- URL: http://arxiv.org/abs/2402.12370v2
- Date: Thu, 03 Oct 2024 22:55:11 GMT
- Title: AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies
- Authors: Xiao Ye, Andrew Wang, Jacob Choi, Yining Lu, Shreya Sharma, Lingfeng Shen, Vijay Tiyyala, Nicholas Andrews, Daniel Khashabi,
- Abstract summary: Analogical thinking allows humans to solve problems in creative ways.
Can language models (LMs) do the same?
benchmarking approach focuses on aspects of this ability that are common among humans.
- Score: 19.613777134600408
- License:
- Abstract: Humans regularly engage in analogical thinking, relating personal experiences to current situations (X is analogous to Y because of Z). Analogical thinking allows humans to solve problems in creative ways, grasp difficult concepts, and articulate ideas more effectively. Can language models (LMs) do the same? To answer this question, we propose AnaloBench, a benchmark to determine analogical reasoning ability in LMs. Our benchmarking approach focuses on aspects of this ability that are common among humans: (i) recalling related experiences from a large amount of information, and (ii) applying analogical reasoning to complex and lengthy scenarios. We test a broad collection of proprietary models (e.g., GPT family, Claude V2) and open source models such as LLaMA2. As in prior results, scaling up LMs results in some performance boosts. Surprisingly, scale offers minimal gains when, (i) analogies involve lengthy scenarios, or (ii) recalling relevant scenarios from a large pool of information, a process analogous to finding a needle in a haystack. We hope these observations encourage further research in this field.
Related papers
- Evaluating the Robustness of Analogical Reasoning in Large Language Models [6.5855735579366685]
We investigate the robustness of analogy-making abilities previously claimed for LLMs.
We test humans and GPT models on robustness to variants of the original analogy problems.
Unlike humans, the performance of GPT models are susceptible to answer-order effects.
arXiv Detail & Related papers (2024-11-21T15:25:08Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Relevant or Random: Can LLMs Truly Perform Analogical Reasoning? [44.158548608820624]
Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences.
The NLP community has also recently found that self-generating relevant examples in the context can help large language models (LLMs) better solve a given problem than hand-crafted prompts.
We show that self-generated random examples can surprisingly achieve comparable or even better performance.
arXiv Detail & Related papers (2024-04-19T09:15:07Z) - Using Counterfactual Tasks to Evaluate the Generality of Analogical
Reasoning in Large Language Models [7.779982757267302]
We investigate the generality of analogy-making abilities previously claimed for large language models (LLMs)
We show that while the performance of humans remains high for all the problems, the GPT models' performance declines sharply on the counterfactual set.
arXiv Detail & Related papers (2024-02-14T05:52:23Z) - Dive into the Chasm: Probing the Gap between In- and Cross-Topic
Generalization [66.4659448305396]
This study analyzes various LMs with three probing-based experiments to shed light on the reasons behind the In- vs. Cross-Topic generalization gap.
We demonstrate, for the first time, that generalization gaps and the robustness of the embedding space vary significantly across LMs.
arXiv Detail & Related papers (2024-02-02T12:59:27Z) - StoryAnalogy: Deriving Story-level Analogies from Large Language Models
to Unlock Analogical Understanding [72.38872974837462]
We evaluate the ability to identify and generate analogies by constructing a first-of-its-kind large-scale story-level analogy corpus.
textscStory Analogy contains 24K story pairs from diverse domains with human annotations on two similarities from the extended Structure-Mapping Theory.
We observe that the data in textscStory Analogy can improve the quality of analogy generation in large language models.
arXiv Detail & Related papers (2023-10-19T16:29:23Z) - Can language models learn analogical reasoning? Investigating training objectives and comparisons to human performance [0.0]
We test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans.
Our experiments find that models are able to learn analogical reasoning, even with a small amount of data.
arXiv Detail & Related papers (2023-10-09T10:34:38Z) - ARN: Analogical Reasoning on Narratives [13.707344123755126]
We develop a framework that operationalizes dominant theories of analogy, using narrative elements to create surface and system mappings.
We show that while all LLMs can largely recognize near analogies, even the largest ones struggle with far analogies in a zero-shot setting.
arXiv Detail & Related papers (2023-10-02T08:58:29Z) - ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base [51.777618249271725]
ANALOGYKB is a million-scale analogy knowledge base derived from existing knowledge graphs (KGs)
It identifies two types of analogies from the KGs: 1) analogies of the same relations, which can be directly extracted from the KGs, and 2) analogies of analogous relations, which are identified with a selection and filtering pipeline enabled by large language models (LLMs)
arXiv Detail & Related papers (2023-05-10T09:03:01Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.