ANALOGICAL -- A Novel Benchmark for Long Text Analogy Evaluation in
Large Language Models
- URL: http://arxiv.org/abs/2305.05050v3
- Date: Thu, 25 May 2023 20:38:17 GMT
- Title: ANALOGICAL -- A Novel Benchmark for Long Text Analogy Evaluation in
Large Language Models
- Authors: Thilini Wijesiriwardene, Ruwan Wickramarachchi, Bimal G. Gajera,
Shreeyash Mukul Gowaikar, Chandan Gupta, Aman Chadha, Aishwarya Naresh
Reganti, Amit Sheth, Amitava Das
- Abstract summary: ANALOGICAL is a new benchmark to intrinsically evaluate large language models.
Our evaluation finds that it is increasingly challenging for LLMs to identify analogies when going up the analogy taxonomy.
- Score: 1.4546044532817048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the past decade, analogies, in the form of word-level analogies, have
played a significant role as an intrinsic measure of evaluating the quality of
word embedding methods such as word2vec. Modern large language models (LLMs),
however, are primarily evaluated on extrinsic measures based on benchmarks such
as GLUE and SuperGLUE, and there are only a few investigations on whether LLMs
can draw analogies between long texts. In this paper, we present ANALOGICAL, a
new benchmark to intrinsically evaluate LLMs across a taxonomy of analogies of
long text with six levels of complexity -- (i) word, (ii) word vs. sentence,
(iii) syntactic, (iv) negation, (v) entailment, and (vi) metaphor. Using
thirteen datasets and three different distance measures, we evaluate the
abilities of eight LLMs in identifying analogical pairs in the semantic vector
space. Our evaluation finds that it is increasingly challenging for LLMs to
identify analogies when going up the analogy taxonomy.
Related papers
- Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)
Traditional psycholinguistic evaluations often reflect statistical biases that may misrepresent LLMs' true linguistic capabilities.
We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.
arXiv Detail & Related papers (2024-11-12T04:16:44Z) - A Comparative Study of Quality Evaluation Methods for Text Summarization [0.5512295869673147]
This paper proposes a novel method based on large language models (LLMs) for evaluating text summarization.
Our results show that LLMs evaluation aligns closely with human evaluation, while widely-used automatic metrics such as ROUGE-2, BERTScore, and SummaC do not and also lack consistency.
arXiv Detail & Related papers (2024-06-30T16:12:37Z) - Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism [62.571419297164645]
This paper provides a systematic overview of prior works on the logical reasoning ability of large language models for analyzing categorical syllogisms.
We first investigate all the possible variations for the categorical syllogisms from a purely logical perspective.
We then examine the underlying configurations (i.e., mood and figure) tested by the existing datasets.
arXiv Detail & Related papers (2024-06-26T21:17:20Z) - BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS)
We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting.
Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z) - StoryAnalogy: Deriving Story-level Analogies from Large Language Models
to Unlock Analogical Understanding [72.38872974837462]
We evaluate the ability to identify and generate analogies by constructing a first-of-its-kind large-scale story-level analogy corpus.
textscStory Analogy contains 24K story pairs from diverse domains with human annotations on two similarities from the extended Structure-Mapping Theory.
We observe that the data in textscStory Analogy can improve the quality of analogy generation in large language models.
arXiv Detail & Related papers (2023-10-19T16:29:23Z) - On the Relationship between Sentence Analogy Identification and Sentence
Structure Encoding in Large Language Models [7.716762867270514]
We look at how Large Language Models' abilities to capture sentence analogies vary with their abilities to encode syntactic and semantic structures.
We find that the LLMs which capture syntactic structures better, also have higher abilities in identifying sentence analogies.
arXiv Detail & Related papers (2023-10-11T18:59:48Z) - Long-form analogies generated by chatGPT lack human-like
psycholinguistic properties [0.5884031187931463]
We apply psycholinguistic methods to evaluate individual sentences from long-form analogies about biochemical concepts.
We compare analogies generated by human subjects enrolled in introductory biochemistry courses to analogies generated by chatGPT.
arXiv Detail & Related papers (2023-06-07T15:42:31Z) - ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base [51.777618249271725]
ANALOGYKB is a million-scale analogy knowledge base derived from existing knowledge graphs (KGs)
It identifies two types of analogies from the KGs: 1) analogies of the same relations, which can be directly extracted from the KGs, and 2) analogies of analogous relations, which are identified with a selection and filtering pipeline enabled by large language models (LLMs)
arXiv Detail & Related papers (2023-05-10T09:03:01Z) - Scientific and Creative Analogies in Pretrained Language Models [24.86477727507679]
This paper examines the encoding of analogy in large-scale pretrained language models, such as BERT and GPT-2.
We introduce the Scientific and Creative Analogy dataset (SCAN), a novel analogy dataset containing systematic mappings of multiple attributes and relational structures across dissimilar domains.
We find that state-of-the-art LMs achieve low performance on these complex analogy tasks, highlighting the challenges still posed by analogy understanding.
arXiv Detail & Related papers (2022-11-28T12:49:44Z) - A Comparative Study of Lexical Substitution Approaches based on Neural
Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.