NYK-MS: A Well-annotated Multi-modal Metaphor and Sarcasm Understanding Benchmark on Cartoon-Caption Dataset
- URL: http://arxiv.org/abs/2409.01037v1
- Date: Mon, 2 Sep 2024 08:14:49 GMT
- Title: NYK-MS: A Well-annotated Multi-modal Metaphor and Sarcasm Understanding Benchmark on Cartoon-Caption Dataset
- Authors: Ke Chang, Hao Li, Junzhao Zhang, Yunfang Wu,
- Abstract summary: We create a new benchmark named NYK-MS, which contains 1,583 samples for metaphor understanding tasks.
Tasks include whether it contains metaphor/sarcasm, which word or object contains metaphor/sarcasm, what does it satirize and why.
All of the 7 tasks are well-annotated by at least 3 annotators.
- Score: 11.453576424853749
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Metaphor and sarcasm are common figurative expressions in people's communication, especially on the Internet or the memes popular among teenagers. We create a new benchmark named NYK-MS (NewYorKer for Metaphor and Sarcasm), which contains 1,583 samples for metaphor understanding tasks and 1,578 samples for sarcasm understanding tasks. These tasks include whether it contains metaphor/sarcasm, which word or object contains metaphor/sarcasm, what does it satirize and why does it contains metaphor/sarcasm, all of the 7 tasks are well-annotated by at least 3 annotators. We annotate the dataset for several rounds to improve the consistency and quality, and use GUI and GPT-4V to raise our efficiency. Based on the benchmark, we conduct plenty of experiments. In the zero-shot experiments, we show that Large Language Models (LLM) and Large Multi-modal Models (LMM) can't do classification task well, and as the scale increases, the performance on other 5 tasks improves. In the experiments on traditional pre-train models, we show the enhancement with augment and alignment methods, which prove our benchmark is consistent with previous dataset and requires the model to understand both of the two modalities.
Related papers
- VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models [76.94378391979228]
We introduce a new, more demanding task known as Interleaved Image-Text (IITC)
This task challenges models to discern and disregard superfluous elements in both images and text to accurately answer questions.
In support of this task, we further craft a new VEGA dataset, tailored for the IITC task on scientific content, and devised a subtask, Image-Text Association (ITA)
arXiv Detail & Related papers (2024-06-14T17:59:40Z) - Metaphor Understanding Challenge Dataset for LLMs [12.444344984005236]
We release the Metaphor Understanding Challenge dataset (MUNCH)
MUNCH is designed to evaluate the metaphor understanding capabilities of large language models (LLMs)
The dataset provides over 10k paraphrases for sentences containing metaphor use, as well as 1.5k instances containing inapt paraphrases.
arXiv Detail & Related papers (2024-03-18T14:08:59Z) - Finding Challenging Metaphors that Confuse Pretrained Language Models [21.553915781660905]
It remains unclear what types of metaphors challenge current state-of-the-art NLP models.
To identify hard metaphors, we propose an automatic pipeline that identifies metaphors that challenge a particular model.
Our analysis demonstrates that our detected hard metaphors contrast significantly with VUA and reduce the accuracy of machine translation by 16%.
arXiv Detail & Related papers (2024-01-29T10:00:54Z) - Image Matters: A New Dataset and Empirical Study for Multimodal
Hyperbole Detection [52.04083398850383]
We create a multimodal detection dataset from Weibo (a Chinese social media)
We treat the text and image from a piece of weibo as two modalities and explore the role of text and image for hyperbole detection.
Different pre-trained multimodal encoders are also evaluated on this downstream task to show their performance.
arXiv Detail & Related papers (2023-07-01T03:23:56Z) - A Match Made in Heaven: A Multi-task Framework for Hyperbole and
Metaphor Detection [27.85834441076481]
Hyperbole and metaphor are common in day-to-day communication.
Existing approaches to automatically detect metaphor and hyperbole have studied these language phenomena independently.
We propose a multi-task deep learning framework to detect hyperbole and metaphor simultaneously.
arXiv Detail & Related papers (2023-05-27T14:17:59Z) - I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create
Visual Metaphors [38.70166865926743]
We propose a new task of generating visual metaphors from linguistic metaphors.
This is a challenging task for diffusion-based text-to-image models, since it requires the ability to model implicit meaning and compositionality.
We create a high-quality dataset containing 6,476 visual metaphors for 1,540 linguistic metaphors and their associated visual elaborations.
arXiv Detail & Related papers (2023-05-24T05:01:10Z) - How to Describe Images in a More Funny Way? Towards a Modular Approach
to Cross-Modal Sarcasm Generation [62.89586083449108]
We study a new problem of cross-modal sarcasm generation (CMSG), i.e., generating a sarcastic description for a given image.
CMSG is challenging as models need to satisfy the characteristics of sarcasm, as well as the correlation between different modalities.
We propose an Extraction-Generation-Ranking based Modular method (EGRM) for cross-model sarcasm generation.
arXiv Detail & Related papers (2022-11-20T14:38:24Z) - Meta-Learning with Variational Semantic Memory for Word Sense
Disambiguation [56.830395467247016]
We propose a model of semantic memory for WSD in a meta-learning setting.
Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork.
We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce scenarios.
arXiv Detail & Related papers (2021-06-05T20:40:01Z) - MERMAID: Metaphor Generation with Symbolism and Discriminative Decoding [22.756157298168127]
Based on a theoretically-grounded connection between metaphors and symbols, we propose a method to automatically construct a parallel corpus.
For the generation task, we incorporate a metaphor discriminator to guide the decoding of a sequence to sequence model fine-tuned on our parallel data.
A task-based evaluation shows that human-written poems enhanced with metaphors are preferred 68% of the time compared to poems without metaphors.
arXiv Detail & Related papers (2021-03-11T16:39:19Z) - Video Understanding as Machine Translation [53.59298393079866]
We tackle a wide variety of downstream video understanding tasks by means of a single unified framework.
We report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT)
arXiv Detail & Related papers (2020-06-12T14:07:04Z) - Metaphoric Paraphrase Generation [58.592750281138265]
We use crowdsourcing to evaluate our results, as well as developing an automatic metric for evaluating metaphoric paraphrases.
We show that while the lexical replacement baseline is capable of producing accurate paraphrases, they often lack metaphoricity.
Our metaphor masking model excels in generating metaphoric sentences while performing nearly as well with regard to fluency and paraphrase quality.
arXiv Detail & Related papers (2020-02-28T16:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.