Leveraging a New Spanish Corpus for Multilingual and Crosslingual
Metaphor Detection
- URL: http://arxiv.org/abs/2210.10358v1
- Date: Wed, 19 Oct 2022 07:55:36 GMT
- Title: Leveraging a New Spanish Corpus for Multilingual and Crosslingual
Metaphor Detection
- Authors: Elisa Sanchez-Bayona, Rodrigo Agerri
- Abstract summary: This work presents the first corpus annotated with naturally occurring metaphors in Spanish large enough to develop systems to perform metaphor detection.
The presented dataset, CoMeta, includes texts from various domains, namely, news, political discourse, Wikipedia and reviews.
- Score: 5.9647924003148365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The lack of wide coverage datasets annotated with everyday metaphorical
expressions for languages other than English is striking. This means that most
research on supervised metaphor detection has been published only for that
language. In order to address this issue, this work presents the first corpus
annotated with naturally occurring metaphors in Spanish large enough to develop
systems to perform metaphor detection. The presented dataset, CoMeta, includes
texts from various domains, namely, news, political discourse, Wikipedia and
reviews. In order to label CoMeta, we apply the MIPVU method, the guidelines
most commonly used to systematically annotate metaphor on real data. We use our
newly created dataset to provide competitive baselines by fine-tuning several
multilingual and monolingual state-of-the-art large language models.
Furthermore, by leveraging the existing VUAM English data in addition to
CoMeta, we present the, to the best of our knowledge, first cross-lingual
experiments on supervised metaphor detection. Finally, we perform a detailed
error analysis that explores the seemingly high transfer of everyday metaphor
across these two languages and datasets.
Related papers
- A framework for annotating and modelling intentions behind metaphor use [12.40493670580608]
We propose a novel taxonomy of intentions commonly attributed to metaphor, which comprises 9 categories.
We also release the first dataset annotated for intentions behind metaphor use.
We use this dataset to test the capability of large language models (LLMs) in inferring the intentions behind metaphor use, in zero- and in-context few-shot settings.
arXiv Detail & Related papers (2024-07-04T14:13:57Z) - Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation [6.0158981171030685]
We present a novel parallel dataset for the tasks of metaphor detection and interpretation that contains metaphor annotations in both Spanish and English.
We investigate language models' metaphor identification and understanding abilities through a series of monolingual and cross-lingual experiments.
arXiv Detail & Related papers (2024-04-10T14:44:48Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets [4.478789600295492]
This paper transforms an existing textual Twitter sentiment dataset into a multimodal format through a straightforward curation process.
Our work opens up new avenues for sentiment-related research within the research community.
arXiv Detail & Related papers (2024-04-02T09:11:58Z) - Metaphors in Pre-Trained Language Models: Probing and Generalization
Across Datasets and Languages [6.7126373378083715]
Large pre-trained language models (PLMs) are assumed to encode metaphorical knowledge useful for NLP systems.
We present studies in multiple metaphor detection datasets and in four languages.
Our experiments suggest that contextual representations in PLMs do encode metaphorical knowledge, and mostly in their middle layers.
arXiv Detail & Related papers (2022-03-26T19:05:24Z) - Models and Datasets for Cross-Lingual Summarisation [78.56238251185214]
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language.
The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German.
We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles.
arXiv Detail & Related papers (2022-02-19T11:55:40Z) - Transferring Knowledge Distillation for Multilingual Social Event
Detection [42.663309895263666]
Recently published graph neural networks (GNNs) show promising performance at social event detection tasks.
We present a GNN that incorporates cross-lingual word embeddings for detecting events in multilingual data streams.
Experiments on both synthetic and real-world datasets show the framework to be highly effective at detection in both multilingual data and in languages where training samples are scarce.
arXiv Detail & Related papers (2021-08-06T12:38:42Z) - UC2: Universal Cross-lingual Cross-modal Vision-and-Language
Pre-training [52.852163987208826]
UC2 is the first machine translation-augmented framework for cross-lingual cross-modal representation learning.
We propose two novel pre-training tasks, namely Masked Region-to-Token Modeling (MRTM) and Visual Translation Language Modeling (VTLM)
Our proposed framework achieves new state-of-the-art on diverse non-English benchmarks while maintaining comparable performance to monolingual pre-trained models on English tasks.
arXiv Detail & Related papers (2021-04-01T08:30:53Z) - A Multi-Perspective Architecture for Semantic Code Search [58.73778219645548]
We propose a novel multi-perspective cross-lingual neural framework for code--text matching.
Our experiments on the CoNaLa dataset show that our proposed model yields better performance than previous approaches.
arXiv Detail & Related papers (2020-05-06T04:46:11Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.