Talking the Talk Does Not Entail Walking the Walk: On the Limits of Large Language Models in Lexical Entailment Recognition
- URL: http://arxiv.org/abs/2406.14894v2
- Date: Thu, 07 Nov 2024 18:15:23 GMT
- Title: Talking the Talk Does Not Entail Walking the Walk: On the Limits of Large Language Models in Lexical Entailment Recognition
- Authors: Candida M. Greco, Lucio La Cava, Andrea Tagarelli,
- Abstract summary: This work investigates the capabilities of eight Large Language Models in recognizing lexical entailment relations among verbs.
Our findings unveil that the models can tackle the lexical entailment recognition task with moderately good performance.
- Score: 3.8623569699070357
- License:
- Abstract: Verbs form the backbone of language, providing the structure and meaning to sentences. Yet, their intricate semantic nuances pose a longstanding challenge. Understanding verb relations through the concept of lexical entailment is crucial for comprehending sentence meanings and grasping verb dynamics. This work investigates the capabilities of eight Large Language Models in recognizing lexical entailment relations among verbs through differently devised prompting strategies and zero-/few-shot settings over verb pairs from two lexical databases, namely WordNet and HyperLex. Our findings unveil that the models can tackle the lexical entailment recognition task with moderately good performance, although at varying degree of effectiveness and under different conditions. Also, utilizing few-shot prompting can enhance the models' performance. However, perfectly solving the task arises as an unmet challenge for all examined LLMs, which raises an emergence for further research developments on this topic.
Related papers
- Large Language Models Lack Understanding of Character Composition of Words [3.9901365062418317]
Large language models (LLMs) have demonstrated remarkable performances on a wide range of natural language tasks.
We show that most of them fail to reliably carry out even the simple tasks that can be handled by humans with perfection.
arXiv Detail & Related papers (2024-05-18T18:08:58Z) - Tokenization Impacts Multilingual Language Modeling: Assessing
Vocabulary Allocation and Overlap Across Languages [3.716965622352967]
We propose new criteria to evaluate the quality of lexical representation and vocabulary overlap observed in sub-word tokenizers.
Our findings show that the overlap of vocabulary across languages can be actually detrimental to certain downstream tasks.
arXiv Detail & Related papers (2023-05-26T18:06:49Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Transfer Learning of Lexical Semantic Families for Argumentative
Discourse Units Identification [0.8508198765617198]
Argument mining tasks require an informed range of low to high complexity linguistic phenomena and commonsense knowledge.
Previous work has shown that pre-trained language models are highly effective at encoding syntactic and semantic linguistic phenomena.
It remains an issue of how much the existing pre-trained language models encompass the complexity of argument mining tasks.
arXiv Detail & Related papers (2022-09-06T13:38:47Z) - Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse.
It remains an open question to what extent modern language models can interpret nonliteral phrases.
We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Lexically-constrained Text Generation through Commonsense Knowledge
Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts.
We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z) - Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks.
Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z) - SLK-NER: Exploiting Second-order Lexicon Knowledge for Chinese NER [8.122270502556374]
We present new insight into second-order lexicon knowledge (SLK) of each character in the sentence to provide more lexical word information.
The proposed model can exploit more discernible lexical words information with the help of global context.
arXiv Detail & Related papers (2020-07-16T15:53:02Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.