Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation
- URL: http://arxiv.org/abs/2404.07053v3
- Date: Mon, 21 Jul 2025 08:12:47 GMT
- Title: Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation
- Authors: Elisa Sanchez-Bayona, Rodrigo Agerri,
- Abstract summary: We present Meta4XNLI, the first parallel dataset for Natural Language Inference (NLI) newly annotated for metaphor detection and interpretation.<n>Our results show that fine-tuned encoders outperform decoders-only LLMs in metaphor detection.<n>Our study also finds that translation plays an important role in the preservation or loss of metaphors across languages.
- Score: 6.0158981171030685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metaphors are a ubiquitous but often overlooked part of everyday language. As a complex cognitive-linguistic phenomenon, they provide a valuable means to evaluate whether language models can capture deeper aspects of meaning, including semantic, pragmatic, and cultural context. In this work, we present Meta4XNLI, the first parallel dataset for Natural Language Inference (NLI) newly annotated for metaphor detection and interpretation in both English and Spanish. Meta4XNLI facilitates the comparison of encoder- and decoder-based models in detecting and understanding metaphorical language in multilingual and cross-lingual settings. Our results show that fine-tuned encoders outperform decoders-only LLMs in metaphor detection. Metaphor interpretation is evaluated via the NLI framework with comparable performance of masked and autoregressive models, which notably decreases when the inference is affected by metaphorical language. Our study also finds that translation plays an important role in the preservation or loss of metaphors across languages, introducing shifts that might impact metaphor occurrence and model performance. These findings underscore the importance of resources like Meta4XNLI for advancing the analysis of the capabilities of language models and improving our understanding of metaphor processing across languages. Furthermore, the dataset offers previously unavailable opportunities to investigate metaphor interpretation, cross-lingual metaphor transferability, and the impact of translation on the development of multilingual annotated resources.
Related papers
- Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding [6.0158981171030685]
This paper presents a comprehensive evaluation of the capabilities of Large Language Models (LLMs) in metaphor interpretation across multiple datasets, tasks, and prompt configurations.<n>We address these limitations by conducting extensive experiments using diverse publicly available datasets with inference and metaphor annotations.<n>The results indicate that LLMs' performance is more influenced by features like lexical overlap and sentence length than by metaphorical content.
arXiv Detail & Related papers (2025-07-21T08:09:11Z) - Cultural Bias Matters: A Cross-Cultural Benchmark Dataset and Sentiment-Enriched Model for Understanding Multimodal Metaphors [26.473849906627677]
We introduce MultiMM, a dataset designed for cross-cultural studies of metaphor in Chinese and English.<n>We propose Sentiment-Enriched Metaphor Detection (SEMD), a baseline model that integrates sentiment embeddings to enhance metaphor comprehension across cultural backgrounds.
arXiv Detail & Related papers (2025-06-08T04:02:50Z) - Towards Multimodal Metaphor Understanding: A Chinese Dataset and Model for Metaphor Mapping Identification [9.08615188602226]
We develop a Chinese multimodal metaphor advertisement dataset (namely CM3D) that includes annotations of specific target and source domains.
We propose a Chain-of-NLP (CoT) Prompting-based Metaphor Mapping Identification Model (CPMMIM) which simulates the human cognitive process for identifying these mappings.
arXiv Detail & Related papers (2025-01-05T04:15:03Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Multi-lingual and Multi-cultural Figurative Language Understanding [69.47641938200817]
Figurative language permeates human communication, but is relatively understudied in NLP.
We create a dataset for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba.
Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region.
All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data.
arXiv Detail & Related papers (2023-05-25T15:30:31Z) - LMs stand their Ground: Investigating the Effect of Embodiment in
Figurative Language Interpretation by Language Models [0.0]
Figurative language is a challenge for language models since its interpretation deviates from their conventional order and meaning.
Yet, humans can easily understand and interpret metaphors as they can be derived from embodied metaphors.
This study shows how larger language models perform better at interpreting metaphoric sentences when the action of the metaphorical sentence is more embodied.
arXiv Detail & Related papers (2023-05-05T11:44:12Z) - Leveraging a New Spanish Corpus for Multilingual and Crosslingual
Metaphor Detection [5.9647924003148365]
This work presents the first corpus annotated with naturally occurring metaphors in Spanish large enough to develop systems to perform metaphor detection.
The presented dataset, CoMeta, includes texts from various domains, namely, news, political discourse, Wikipedia and reviews.
arXiv Detail & Related papers (2022-10-19T07:55:36Z) - Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse.
It remains an open question to what extent modern language models can interpret nonliteral phrases.
We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - On the Impact of Temporal Representations on Metaphor Detection [1.6959319157216468]
State-of-the-art approaches for metaphor detection compare their literal - or core - meaning and their contextual meaning using sequential metaphor classifiers based on neural networks.
This study examines the metaphor detection task with a detailed exploratory analysis where different temporal and static word embeddings are used to account for different representations of literal meanings.
Results suggest that different word embeddings do impact on the metaphor detection task and some temporal word embeddings slightly outperform static methods on some performance measures.
arXiv Detail & Related papers (2021-11-05T08:43:21Z) - It's not Rocket Science : Interpreting Figurative Language in Narratives [48.84507467131819]
We study the interpretation of two non-compositional figurative languages (idioms and similes)
Our experiments show that models based solely on pre-trained language models perform substantially worse than humans on these tasks.
We additionally propose knowledge-enhanced models, adopting human strategies for interpreting figurative language.
arXiv Detail & Related papers (2021-08-31T21:46:35Z) - Interpreting Verbal Metaphors by Paraphrasing [12.750941606061877]
We show that our paraphrasing method significantly outperforms the state-of-the-art baseline.
We also demonstrate that our method can help a machine translation system improve its accuracy in translating English metaphors to 8 target languages.
arXiv Detail & Related papers (2021-04-07T21:00:23Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.