AnaloBench: Benchmarking the Identification of Abstract and Long-context
Analogies
- URL: http://arxiv.org/abs/2402.12370v1
- Date: Mon, 19 Feb 2024 18:56:44 GMT
- Title: AnaloBench: Benchmarking the Identification of Abstract and Long-context
Analogies
- Authors: Xiao Ye, Andrew Wang, Jacob Choi, Yining Lu, Shreya Sharma, Lingfeng
Shen, Vijay Tiyyala, Nicholas Andrews, Daniel Khashabi
- Abstract summary: Analogical thinking allows humans to solve problems in creative ways, grasp difficult concepts, and articulate ideas more effectively.
We propose ANALOBENCH, a benchmark to determine analogical reasoning ability in language models (LMs)
Our benchmarking approach focuses on aspects of this ability that are common among humans: (i) recalling related experiences from a large amount of information, and (ii) applying analogical reasoning to complex and lengthy scenarios.
- Score: 20.35137053775108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans regularly engage in analogical thinking, relating personal experiences
to current situations ($X$ is analogous to $Y$ because of $Z$). Analogical
thinking allows humans to solve problems in creative ways, grasp difficult
concepts, and articulate ideas more effectively. Can language models (LMs) do
the same? To answer this question, we propose ANALOBENCH, a benchmark to
determine analogical reasoning ability in LMs. Our benchmarking approach
focuses on aspects of this ability that are common among humans: (i) recalling
related experiences from a large amount of information, and (ii) applying
analogical reasoning to complex and lengthy scenarios. We test a broad
collection of proprietary models (e.g., GPT family, Claude V2) and open source
models such as LLaMA2. As in prior results, scaling up LMs results in some
performance boosts. Surprisingly, scale offers minimal gains when, (i)
analogies involve lengthy scenarios, or (ii) recalling relevant scenarios from
a large pool of information, a process analogous to finding a needle in a
haystack. We hope these observations encourage further research in this field.
Related papers
- LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Relevant or Random: Can LLMs Truly Perform Analogical Reasoning? [44.158548608820624]
Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences.
The NLP community has also recently found that self-generating relevant examples in the context can help large language models (LLMs) better solve a given problem than hand-crafted prompts.
We show that self-generated random examples can surprisingly achieve comparable or even better performance.
arXiv Detail & Related papers (2024-04-19T09:15:07Z) - ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies [16.92480305308536]
We develop a pipeline for creating complex, paragraph-based analogies.
We publish a gold-set, validated by humans, and a silver-set, generated automatically.
We demonstrate that our silver-set is useful for training models.
arXiv Detail & Related papers (2024-03-02T08:53:40Z) - Using Counterfactual Tasks to Evaluate the Generality of Analogical
Reasoning in Large Language Models [7.779982757267302]
We investigate the generality of analogy-making abilities previously claimed for large language models (LLMs)
We show that while the performance of humans remains high for all the problems, the GPT models' performance declines sharply on the counterfactual set.
arXiv Detail & Related papers (2024-02-14T05:52:23Z) - Dive into the Chasm: Probing the Gap between In- and Cross-Topic
Generalization [66.4659448305396]
This study analyzes various LMs with three probing-based experiments to shed light on the reasons behind the In- vs. Cross-Topic generalization gap.
We demonstrate, for the first time, that generalization gaps and the robustness of the embedding space vary significantly across LMs.
arXiv Detail & Related papers (2024-02-02T12:59:27Z) - StoryAnalogy: Deriving Story-level Analogies from Large Language Models
to Unlock Analogical Understanding [72.38872974837462]
We evaluate the ability to identify and generate analogies by constructing a first-of-its-kind large-scale story-level analogy corpus.
textscStory Analogy contains 24K story pairs from diverse domains with human annotations on two similarities from the extended Structure-Mapping Theory.
We observe that the data in textscStory Analogy can improve the quality of analogy generation in large language models.
arXiv Detail & Related papers (2023-10-19T16:29:23Z) - Can language models learn analogical reasoning? Investigating training objectives and comparisons to human performance [0.0]
We test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans.
Our experiments find that models are able to learn analogical reasoning, even with a small amount of data.
arXiv Detail & Related papers (2023-10-09T10:34:38Z) - ARN: Analogical Reasoning on Narratives [13.707344123755126]
We develop a framework that operationalizes dominant theories of analogy, using narrative elements to create surface and system mappings.
We show that while all LLMs can largely recognize near analogies, even the largest ones struggle with far analogies in a zero-shot setting.
arXiv Detail & Related papers (2023-10-02T08:58:29Z) - ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base [51.777618249271725]
ANALOGYKB is a million-scale analogy knowledge base derived from existing knowledge graphs (KGs)
It identifies two types of analogies from the KGs: 1) analogies of the same relations, which can be directly extracted from the KGs, and 2) analogies of analogous relations, which are identified with a selection and filtering pipeline enabled by large language models (LLMs)
arXiv Detail & Related papers (2023-05-10T09:03:01Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.