Metaphor Understanding Challenge Dataset for LLMs
- URL: http://arxiv.org/abs/2403.11810v1
- Date: Mon, 18 Mar 2024 14:08:59 GMT
- Title: Metaphor Understanding Challenge Dataset for LLMs
- Authors: Xiaoyu Tong, Rochelle Choenni, Martha Lewis, Ekaterina Shutova,
- Abstract summary: We release the Metaphor Understanding Challenge dataset (MUNCH)
MUNCH is designed to evaluate the metaphor understanding capabilities of large language models (LLMs)
The dataset provides over 10k paraphrases for sentences containing metaphor use, as well as 1.5k instances containing inapt paraphrases.
- Score: 12.444344984005236
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Metaphors in natural language are a reflection of fundamental cognitive processes such as analogical reasoning and categorisation, and are deeply rooted in everyday communication. Metaphor understanding is therefore an essential task for large language models (LLMs). We release the Metaphor Understanding Challenge Dataset (MUNCH), designed to evaluate the metaphor understanding capabilities of LLMs. The dataset provides over 10k paraphrases for sentences containing metaphor use, as well as 1.5k instances containing inapt paraphrases. The inapt paraphrases were carefully selected to serve as control to determine whether the model indeed performs full metaphor interpretation or rather resorts to lexical similarity. All apt and inapt paraphrases were manually annotated. The metaphorical sentences cover natural metaphor uses across 4 genres (academic, news, fiction, and conversation), and they exhibit different levels of novelty. Experiments with LLaMA and GPT-3.5 demonstrate that MUNCH presents a challenging task for LLMs. The dataset is freely accessible at https://github.com/xiaoyuisrain/metaphor-understanding-challenge.
Related papers
- Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory [0.0]
We show that Large Language Models (LLMs) can accurately identify and explain the presence of conceptual metaphors in natural language data.
Using a novel prompting technique based on metaphor annotation guidelines, we demonstrate that LLMs are a promising tool for large-scale computational research on conceptual metaphors.
arXiv Detail & Related papers (2024-10-11T17:03:13Z) - NYK-MS: A Well-annotated Multi-modal Metaphor and Sarcasm Understanding Benchmark on Cartoon-Caption Dataset [11.453576424853749]
We create a new benchmark named NYK-MS, which contains 1,583 samples for metaphor understanding tasks.
Tasks include whether it contains metaphor/sarcasm, which word or object contains metaphor/sarcasm, what does it satirize and why.
All of the 7 tasks are well-annotated by at least 3 annotators.
arXiv Detail & Related papers (2024-09-02T08:14:49Z) - A framework for annotating and modelling intentions behind metaphor use [12.40493670580608]
We propose a novel taxonomy of intentions commonly attributed to metaphor, which comprises 9 categories.
We also release the first dataset annotated for intentions behind metaphor use.
We use this dataset to test the capability of large language models (LLMs) in inferring the intentions behind metaphor use, in zero- and in-context few-shot settings.
arXiv Detail & Related papers (2024-07-04T14:13:57Z) - LFED: A Literary Fiction Evaluation Dataset for Large Language Models [58.85989777743013]
We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries.
We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions.
We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
arXiv Detail & Related papers (2024-05-16T15:02:24Z) - Reasoning in Conversation: Solving Subjective Tasks through Dialogue
Simulation for Large Language Models [56.93074140619464]
We propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation.
The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales.
We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks.
arXiv Detail & Related papers (2024-02-27T05:37:10Z) - Finding Challenging Metaphors that Confuse Pretrained Language Models [21.553915781660905]
It remains unclear what types of metaphors challenge current state-of-the-art NLP models.
To identify hard metaphors, we propose an automatic pipeline that identifies metaphors that challenge a particular model.
Our analysis demonstrates that our detected hard metaphors contrast significantly with VUA and reduce the accuracy of machine translation by 16%.
arXiv Detail & Related papers (2024-01-29T10:00:54Z) - That was the last straw, we need more: Are Translation Systems Sensitive
to Disambiguating Context? [64.38544995251642]
We study semantic ambiguities that exist in the source (English in this work) itself.
We focus on idioms that are open to both literal and figurative interpretations.
We find that current MT models consistently translate English idioms literally, even when the context suggests a figurative interpretation.
arXiv Detail & Related papers (2023-10-23T06:38:49Z) - Metaphor Generation with Conceptual Mappings [58.61307123799594]
We aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs.
We propose to control the generation process by encoding conceptual mappings between cognitive domains.
We show that the unsupervised CM-Lex model is competitive with recent deep learning metaphor generation systems.
arXiv Detail & Related papers (2021-06-02T15:27:05Z) - MERMAID: Metaphor Generation with Symbolism and Discriminative Decoding [22.756157298168127]
Based on a theoretically-grounded connection between metaphors and symbols, we propose a method to automatically construct a parallel corpus.
For the generation task, we incorporate a metaphor discriminator to guide the decoding of a sequence to sequence model fine-tuned on our parallel data.
A task-based evaluation shows that human-written poems enhanced with metaphors are preferred 68% of the time compared to poems without metaphors.
arXiv Detail & Related papers (2021-03-11T16:39:19Z) - Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks.
Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z) - Metaphoric Paraphrase Generation [58.592750281138265]
We use crowdsourcing to evaluate our results, as well as developing an automatic metric for evaluating metaphoric paraphrases.
We show that while the lexical replacement baseline is capable of producing accurate paraphrases, they often lack metaphoricity.
Our metaphor masking model excels in generating metaphoric sentences while performing nearly as well with regard to fluency and paraphrase quality.
arXiv Detail & Related papers (2020-02-28T16:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.