A Comprehensive Evaluation of Semantic Relation Knowledge of Pretrained Language Models and Humans
- URL: http://arxiv.org/abs/2412.01131v1
- Date: Mon, 02 Dec 2024 05:11:34 GMT
- Title: A Comprehensive Evaluation of Semantic Relation Knowledge of Pretrained Language Models and Humans
- Authors: Zhihan Cao, Hiroaki Yamada, Simone Teufel, Takenobu Tokunaga,
- Abstract summary: We introduce a comprehensive evaluation framework covering five relations beyond hypernymy, namely hyponymy, holonymy, meronymy, antonymy, and synonymy.
Our results reveal a significant knowledge gap between humans and models for almost all semantic relations.
- Score: 3.3311266423308252
- License:
- Abstract: Recently, much work has concerned itself with the enigma of what exactly PLMs (pretrained language models) learn about different aspects of language, and how they learn it. One stream of this type of research investigates the knowledge that PLMs have about semantic relations. However, many aspects of semantic relations were left unexplored. Only one relation was considered, namely hypernymy. Furthermore, previous work did not measure humans' performance on the same task as that solved by the PLMs. This means that at this point in time, there is only an incomplete view of models' semantic relation knowledge. To address this gap, we introduce a comprehensive evaluation framework covering five relations beyond hypernymy, namely hyponymy, holonymy, meronymy, antonymy, and synonymy. We use six metrics (two newly introduced here) for recently untreated aspects of semantic relation knowledge, namely soundness, completeness, symmetry, asymmetry, prototypicality, and distinguishability and fairly compare humans and models on the same task. Our extensive experiments involve 16 PLMs, eight masked and eight causal language models. Up to now only masked language models had been tested although causal and masked language models treat context differently. Our results reveal a significant knowledge gap between humans and models for almost all semantic relations. Antonymy is the outlier relation where all models perform reasonably well. In general, masked language models perform significantly better than causal language models. Nonetheless, both masked and causal language models are likely to confuse non-antonymy relations with antonymy.
Related papers
- A Distributional Perspective on Word Learning in Neural Language Models [57.41607944290822]
There are no widely agreed-upon metrics for word learning in language models.
We argue that distributional signatures studied in prior work fail to capture key distributional information.
We obtain learning trajectories for a selection of small language models we train from scratch.
arXiv Detail & Related papers (2025-02-09T13:15:59Z) - Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Language in Vivo vs. in Silico: Size Matters but Larger Language Models Still Do Not Comprehend Language on a Par with Humans [1.8434042562191815]
This work investigates the role of model scaling in determining whether differences between humans and models are amenable to model size.
We test three Large Language Models (LLMs) on a grammaticality judgment task featuring anaphora, center embedding, comparatives, and negative polarity.
We find that humans are overall less accurate than ChatGPT-4 (76% vs. 80% accuracy, respectively), but that this is due to ChatGPT-4 outperforming humans only in one task condition, namely on grammatical sentences.
arXiv Detail & Related papers (2024-04-23T10:09:46Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained
language models [0.0]
Modern pretrained language models, such as BERT, RoBERTa and GPT-3 hold the promise of performing better on logical tasks than classic static word embeddings.
We investigate the extent to which BERT, RoBERTa, GPT-2 and GPT-3 exhibit general, human-like, knowledge of these common words.
We find that despite capturing some aspects of logical meaning, the models fall far short of human performance.
arXiv Detail & Related papers (2023-05-25T18:56:26Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Visual Spatial Reasoning [35.5155400193075]
We present a dataset containing more than 10k natural text-image pairs with 66 types of spatial relations in English.
We show how the dataset includes challenging linguistic phenomena, such as varying reference frames.
We demonstrate a large gap between human and model performance: the human ceiling is above 95%, while state-of-the-art models only achieve around 70%.
arXiv Detail & Related papers (2022-04-30T23:03:49Z) - What's the best place for an AI conference, Vancouver or ______: Why
completing comparative questions is difficult [22.04829832439774]
We study the ability of neural LMs to ask (not answer) reasonable questions.
We show that accuracy in this fill-in-the-blank task is well-correlated with human judgements of whether a question is reasonable.
arXiv Detail & Related papers (2021-04-05T14:56:09Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.